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Introduction 

The IDT79R4600 (R4600) and IDT79R4700 (R4700) support a wide 
variety of processor-based applications. Because of their low power 
consumption, coupled with high performance, they are well suited for a 
wide variety of embedded applications, including laser printers, 
X-terminals, internetworking equipment, imaging equipment, and high- 
end video games. The R4600 and R4700 are also well-suited to high- 
performance desktop applications such as Windows™ NT desktop and 
notebook systems, and 3-D workstations. 

Compatible with the IDT79R4400PC family for both hardware and 
software, the R4600 and R4700 will serve in many of the same 
applications, but in addition support low-power operation for applications 
such as notebook computers. 

Floating Point 

The R4700 has improved FPA multiply operations. All other features of 
the R4700 are the same as those in the R4600. In this manual, these two 
products are referred to collectively as the R4600/R4700, except when 
information pertains only to one of them. In that situation they are 
referred to individually. 

Secondary Cache 

The R4600/R4700 does not provide integrated secondary cache and 
multiprocessor support as found in the R4000SC and R4000MC, but it is 
simple to build an external secondary cache. For most embedded 
applications, however, the large on-chip, two-way set associative caches 
make this unnecessary. 

Performance 

The R4600/R4700 brings R4000SC performance levels to the R4000PC 
package, while at the same time providing lower cost and lower power. It 
does this by providing larger on-chip caches that are two-way set 
associative, fewer pipeline stalls, and early restart for data cache misses. 
The result is higher performance than for an R4000 at the same frequency 
and for the same system latencies (exact figures are system dependent). 

Upward Compatibility 

The R4600/R4700 provides complete upward application-software 
compatibility with the IDT79R3000 family of microprocessors, including 
the IDT79R3000A and the IDT RISController™ family (IDT79R30xx family) 
as well the IDT79R4000 family of microprocessors. Microsoft 
Windows™ NT and UNISOFT Unix™ V.4 operating systems insure the 
availability of thousands of applications programs, geared to provide a 
complete solution to a large number of processing needs. An array of 
development tools facilitates the rapid development of R4600/R4700- 
based systems, enabling a wide variety of customers to take advantage of 
the MIPS Open Architecture philosophy. 

Together with the R4400, the R4600/R4700 provides a compatible, 
timely, and necessary evolution path from 32-bit to true, 64-bit 
computing. The original design objectives of the R4000 clearly mandated 
this evolution path; the result is a true 64-bit processor fully compatible 
with 32-bit operating systems and applications. 

The R4600/R4700 enables 32-bit applications to access 64-bit compute 
power painlessly. The software tools support a wide variety of models, 
including 32-bit address and data, 64-bit address and data, and 32-bit 
address/64-bit data. 32-bit address/data enables applications to be 
migrated without "cleaning up" some software. 
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The R4600/R4700 offers high-performance, large caches, and MMU and 
FPA functions to these systems. For desktop systems, the R4600/R4700 
supports a full migration to 64-bit, allowing 64-bit systems to execute true 
64-bit or older 32-bit applications. For embedded applications, the power 
and bandwidth of 64-bit data types can be used without the memory 
expansion of 64-bit addressing. 

The list on the following page summarizes the R4600/R4700 features. 
For a feature-by-feature comparison with the R4000, refer to the tables 
beginning on page 1-23. 
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Features 

• True 64-bit microprocessor 

- 64-bit integer operations 

- 64-bit floating-point operations 

- 64-bit registers 

- 64-bit virtual address space 

• High-performance microprocessor 

- For R4600: 133 peak MIPS at 133MHz 
For R4700: 175 peak MIPS at 175MHz 

- For R4600: 44 peak MFLOP/s at 133MHz 
For R4700: 87 peak MFLOP/s at 175MH 

- For R4600: 109 SPECint92 and 83 SPECfp92 at 150Mz 
For R4700: 132 SPECint92 and 94 SPECfp92 at 175Mz 

- Large two-way set associative caches on-chip 

• Improved FPA multiply performance (R4700 only) 

- 1 mul, 1 add every 4 clock cycles 

• High level of integration 

- 64-bit integer CPU 

- 64-bit floating-point unit 

- 16KB instruction cache; 16KB data cache 

- Flexible MMU with large TLB 

• Low-power operation 

- 3.3V or 5V power supply options 

- For R4600: 25mW/MHz internal power dissipation 

(2.5W@ 100MHz, 3.3V) 

For R4700: 24mW/MHz internal power dissipation 

(2.4W @ 100MHz, 3.3V) 

- Standby mode reduces internal power to 400mW 

• Fully software compatible with R4000 Processor Family 

• Standard operating system support includes: 

- Microsoft Windows NT 

- UNISOFT Unix™ System V.4 

- JMI C-executive 

- VX Works 

• Available in 179-pin PGA or 208-pin MQUAD 

• Input and output clock frequency: 

- Input clock at one-half pipeline frequency 

- Output clock is a programmable divisor of the pipeline frequency 

- Selectable bus frequency 

- Ratios of 1 /2. . . 1 /8 of pipeline rate 

• 64GB physical address space 

• Processor family for a wide variety of applications 

- Desktop workstations and PCs 

- Deskside or departmental servers 

- Routers 

- High-performance embedded applications 

- Notebooks 

• Large number of development tools, including: 

- Cross compilers 

- Logic models 

- Logic analyzer support 
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Device Overview 

The R4600/R4700 family brings a high-level of integration designed for 
high-performance and high-bandwidth computing. The key elements of 
the R4600/R4700 are briefly described below. An overview of these blocks 
is found here, with more detailed information on each block presented in 
subsequent chapters. 

Figure 1.1 shows a block level representation of the functional units 
within the R4600/R4700. 
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Figure 1.1 R4600/R4700 Block Diagram 

Pipeline Overview 

The R4600/R4700 uses a 5-stage pipeline similar to the IDT79R3000. 
The simplicity of this pipeline allows the R4600/R4700 to be lower-cost 
and lower-power than super-scalar or super-pipelined processors. Unlike 
the R3000, the R4600/R4700 does virtual-to-physical translation in 
parallel with cache access. This allows the R4600/R4700 to operate at over 
twice the frequency of the R3000 and to support a larger TLB for address 
translation. 

Compared to the 8-stage R4000 pipeline, the R4600/R4700 is more 
efficient (requires fewer stalls). This is because the branch and load latency 
for the R4600/R4700 is shorter than for the R4000 (both are 2 cycles for 
the R4600/R4700 but are 3 and 4 cycles respectively for the R4000). 
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The internal pipeline of the R4600/R4700 processor operates at twice 
the frequency of the master clock, as discussed in Chapter 3. The 
processor achieves high throughput by pipelining cache accesses, 
shortening register access times, implementing virtual-indexed primary 
caches, and allowing the latency of certain functional units to span more 
than one pipeline clock cycles. 

Refer to Chapter 3 for a detailed discussion of the CPU pipeline 
operation, including descriptions of the delay instructions, interruptions 
to the pipeline flow caused by interlocks and exceptions, and the R4600/ 
R4700 implementation of a store buffer. Refer to Chapter 6 for a detailed 
discussion of the FPU pipeline. 

CPU Register Overview 

The R4600/R4700 has thirty-two general purpose registers. These 
registers are used for scalar integer operations and address calculation. 
The register file consists of two read ports and one write port, and is fully 
bypassed to minimize operation latency in the pipeline. 

Figure 1.2 shows the R4600/R4700 CPU registers. 
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Figure 1.2 R4600/R4700 CPU Registers 

Two of the CPU general purpose registers have assigned functions: 

• rO is hardwired to a value of zero, and can be used as the target reg- 
ister for any instruction whose result is to be discarded. rO can also 
be used as a source when a zero value is needed. 

• r3 1 is used as an implicit return destination address register by the 
JAL and BAL series of instructions. 

The CPU has three special purpose registers: 

• PC — Program Counter register 

• HI — Multiply and Divide register higher result 

• LO — Multiply and Divide register lower result 
The two Multiply and Divide registers (HI, LO) store: 

• the product of integer multiply operations, or 

• the quotient (in LO) and remainder (in HI) of integer divide operations. 
The R4600/R4700 processor has no Program Status Word (PSW) register 

as such; this is covered by the Status and Cause registers incorporated 
within the System Control Coprocessor (CPO). CPO registers are described 
later in this chapter. 
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CPU Instruction Set Overview 

Each CPU instruction is 32 bits long. As shown in Figure 1.3, there are 
three instruction formats: 

• immediate (I-type) 

• jump (J-type) 

• register (R-type) 
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Figure 1.3 CPU Instruction Formats 

Each format contains a number of different instructions, which are 
described further in this chapter. Fields of the instruction formats are 
described in Chapter 2. 

Instruction decoding is simplified by limiting the number of formats to 
these three. This limitation means that the more complicated (and less 
frequently used) operations and addressing modes can be synthesized by 
the compiler, using sequences of these same simple instructions. 

The instruction set can be further divided into the following groupings: 

• Load and Store instructions move data between memory and general 
registers. They are all immediate (I-type) instructions, since the only 
addressing mode supported is base register plus 16-bit, signed imme- 
diate offset. 

• Computational instructions perform arithmetic, logical, shift, multi- 
ply, and divide operations on values in registers. They include register 
(R-type, in which both the operands and the result are stored in reg- 
isters) and immediate (I-type, in which one operand is a 16-bit imme- 
diate value) formats. 

• Jump and Branch instructions change the control flow of a program. 
Jumps are always made to a paged, absolute address formed by com- 
bining a 26-bit target address with the high-order bits of the Program 
Counter (J-type format) or register address (R-type format). Branches 
have 16-bit offsets relative to the program counter (I-type). Jump And 
Link instructions save their return address in register 31. 

• Coprocessor instructions perform operations in the coprocessors. 
Coprocessor load and store instructions are I-type. 

• Coprocessor (system coprocessor) instructions perform operations 
on CPO registers to control the memory management and exception 
handling facilities of the processor and the standby mode for power 
management. These are listed in Table 1.17. 

• Special instructions perform system calls and breakpoint operations. 
These instructions are always R-type. 

• Exception instructions cause a branch to the general exception-han- 
dling vector based upon the result of a comparison. These instruc- 
tions occur in both R-type (both the operands and the result are 
registers) and I-type (one operand is a 16-bit immediate value) for- 
mats. 
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Chapter 2 provides more detail about these instructions, and Appendix 
A gives a complete description of each. 

Table 1.1 through Table 1.16 list CPU instructions common to MIPS 
R-Series processors, along with the level in which they first appeared. The 
last column in each table refers to the MIPS ISA level in which the 
instruction first appeared. Table 1. 17 lists CPO instructions. 



OpCode 


Description 


MIPS ISA Level 1 


LB 


Load Byte 




LBU 


Load Byte Unsigned 




LH 


Load Halfword 




LHU 


Load Halfword Unsigned 




LW 


Load Word 




LWL 


Load Word Left 




LWR 


Load Word Right 




SB 


Store Byte 




SH 


Store Halfword 




SW 


Store Word 




SWL 


Store Word Left 




SWR 


Store Word Right 




Note: For Tables 1. 1 through 1.17 this column refers to the level in which the 
instruction first appeared. 



Table 1.1 CPU Instruction Set: Load and Store Instructions 



OpCode 


Description 


MIPS ISA Level 


ADDI 


Add Immediate 




ADDIU 


Add Immediate Unsigned 




SLT1 


Set on Less Than Immediate 




SLTTU 


Set on Less Than Immediate 
Unsigned 




ANDI 


AND Immediate 




ORI 


OR Immediate 




XORI 


Exclusive OR Immediate 




LUI 


Load Upper Immediate 





Table 1.2 CPU Instruction Set: Arithmetic Instructions (ALU Immediate) 
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OpCode 


Description 


MIPS ISA Level 


ADD 


Add 




ADDU 


Add Unsigned 




SUB 


Subtract 




SUBU 


Subtract Unsigned 




SLT 


Set on Less Than 




SLTU 


Set on Less Than Unsigned 




AND 


AND 




OR 


OR 




XOR 


Exclusive OR 




NOR 


NOR 





Table 1.3 CPU Instruction Set: Arithmetic (3-Operand, R-Type) 



OpCode 


Description 


MIPS ISA Level 


MULT 


Multiply 




MULTU 


Multiply Unsigned 




DIV 


Divide 




DIVU 


Divide Unsigned 




MFHI 


Move From HI 




MTHI 


Move To HI 




MFLO 


Move From LO 




MTLO 


Move To LO 





Table 1.4 CPU Instruction Set: Multiply and Divide Instructions 



OpCode 


Description 


MIPS ISA Level 


J 


Jump 


I 


JAL 


Jump And Link 


I 



Table 1.5 CPU Instruction Set: Jump and Branch Instruction 
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OpCode 


Description 


MIPS ISA Level 


JR 


Jump Register 




JALR 


Jump And Link Register 




BEQ 


Branch on Equal 




BNE 


Branch on Not Equal 




BLEZ 


Branch on Less Than or Equal to Zero 




BGTZ 


Branch on Greater Than Zero 




BLTZ 


Branch on Less Than Zero 




BGEZ 


Branch on Greater Than or Equal to Zero 




BLTZAL 


Branch on Less Than Zero And Link 




BGEZAL 


Branch on Greater Than or Equal to Zero 
And Link 





Table 1.5 CPU Instruction Set: Jump and Branch Instruction 



OpCode 


Description 


MIPS ISA Level 


SLL 


Shift Left Logical 




SRL 


Shift Right Logical 




SRA 


Shift Right Arithmetic 




SLLV 


Shift Left Logical Variable 




SRLV 


Shift Right Logical Variable 




SRAV 


Shift Right Arithmetic Variable 





Table 1.6 CPU Instruction Set: Shift Instructions 



OpCode 


Description 


MIPS ISA Level 


LWCz 


Load Word to Coprocessor z 




SWCz 


Store Word from Coprocessor z 




MTCz 


Move To Coprocessor z 




MFCz 


Move From Coprocessor z 




CTCz 


Move Control to Coprocessor z 




CFCz 


Move Control From Coprocessor z 




COPz 


Coprocessor Operation z 




BCzT 


Branch on Coprocessor z True 




BCzF 


Branch on Coprocessor z False 





Table 1.7 Instruction Set: Coprocessor Instructions 



OpCode 


Description 


MIPS ISA Level 


SYSCALL 


System Call 


I 


BREAK 


Break 


I 



Table 1.8 CPU Instruction Set: Special Instructions 
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OpCode 


Description 


MIPS ISA Level 


LD 


Load Doubleword 


III 


LDL 


Load Doubleword Left 


III 


LDR 


Load Doubleword Right 


HI 


LL 


Load Linked 


II 


LLD 


Load Linked Doubleword 


III 


LWU 


Load Word Unsigned 


III 


SC 


Store Conditional 


II 


SCD 


Store Conditional Doubleword 


III 


SD 


Store Doubleword 


III 


SDL 


Store Doubleword Left 


III 


SDR 


Store Doubleword Right 


III 


SYNC 


Sync 


II 



Table 1.9 MIPS 2/MIPS 3 Additional: Load and Store Instructions 



OpCode 


Description 


MIPS ISA Level 


DADDI 


Doubleword Add Immediate 


III 


DADDIU 


Doubleword Add Immediate 
Unsigned 


III 



Table 1.10 MIPS 2/MIPS 3 Additional: Arithmetic Instructions (ALU Immediate) 



OpCode 


Description 


MIPS ISA Level 


DMULT 


Doubleword Multiply 


III 


DMULTU 


Doubleword Multiply Unsigned 


III 


DDIV 


Doubleword Divide 


III 


DDIVU 


Doubleword Divide Unsigned 


III 



Table 1.11 MIPS 2 /MIPS 3 Additional: Multiply and Divide Instructions 
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OpCode 


Description 


MIPS ISA Level 


BEQL 


Branch on Equal Likely 


II 


BNEL 


Branch on Not Equal Likely 


II 


BLEZL 


Branch on Less Than or Equal to Zero 
Likely 


II 


BGTZL 


Branch on Greater Than Zero Likely 


II 


BLTZL 


Branch on Less Than Zero Likely 


II 


BGEZL 


Branch on Greater Than or Equal to Zero 
Likely 


II 


BLTZALL 


Branch on Less Than Zero And Link 
Likely 


II 


BGEZALL 


Branch on Greater Than or Equal to Zero 
And Link Likely 


II 


BCzfTL 


Branch on Coprocessor z True Likely 


II 


BCzFL 


Branch on Coprocessor z False Likely 


II 



Table 1.12 MIPS 2/MIPS 3 Additional: Branch Instructions 



OpCode 


Description 


MIPS ISA Level 


DADD 


Doubleword Add 


III 


DADDU 


Doubleword Add Unsigned 


III 


DSUB 


Doubleword Subtract 


III 


DSUBU 


Doubleword Subtract Unsigned 


III 



Table 1.13 BOPS 2/MIPS 3 Additional: Arithmetic Instructions 
(3-operand, R-type) 



OpCode 


Description 


MIPS ISA Level 


DSLL 


Doubleword Shift Left Logical 


III 


DSRL 


Doubleword Shift Right Logical 


III 


DSRA 


Doubleword Shift Right Arithmetic 


III 


DSLLV 


Doubleword Shift Left Logical Variable 


III 


DSRLV 


Doubleword Shift Right Logical Variable 


III 


DSRAV 


Doubleword Shift Right Arithmetic Variable 


III 


DSLL32 


Doubleword Shift Left Logical + 32 


III 


DSRL32 


Doubleword Shift Right Logical + 32 


III 


DSRA32 


Doubleword Shift Right Arithmetic + 32 


III 



Table 1.14 MIPS 2/MIPS 3 Additional: Shift Instructions 
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OpCode 


Description 


MIPS ISA Level 


TGE 


Trap If Greater Than or Equal 


II 


TGEU 


Trap If Greater Than or Equal Unsigned 


II 


TLT 


Trap If Less Than 


II 


TLTU 


Trap if Less Than Unsigned 


II 


TEQ 


Trap if Equal 


II 


TOE 


Trap if Not Equal 


II 


TGEI 


Trap if Greater Than or Equal Immediate 


II 


TGEIU 


Trap if Greater Than or Equal Immediate 
Unsigned 


II 


TLTI 


Trap if Less Than Immediate 


II 


TLTIU 


Trap if Less Than Immediate Unsigned 


II 


TEQI 


Trap if Equal Immediate 


II 


TOEI 


Trap if Not Equal Immediate 


II 



Table 1.15 MIPS 2 /MIPS 3 Additional: Exception Instructions 



OpCode 


Description 


MIPS ISA Level 


DMFCz 


Doubleword Move From Coprocessor z 


II 


DMTCz 


Doubleword Move To Coprocessor z 


II 


LDCz 


Load Double Coprocessor z 


II 


SDCz 


Store Double Coprocessor z 


II 



Table 1.16 MIPS 2 /MIPS 3 Additional: Coprocessor Instructions 



OpCode 


Description 


MIPS ISA Level 


DMFCO 


Doubleword Move From CPO 


III 


DMTCO 


Doubleword Move To CPO 


III 


MTCO 


Move to CPO 


I 


MFCO 


Move from CPO 


I 


TLBR 


Read Indexed TLB Entry 


I 


TLBWI 


Write Indexed TLB Entry 


I 


TLBWR 


Write Random TLB Entry 


I 


TLBP 


Probe TLB for Matching Entry 


I 


CACHE 


Cache Operation 


R4xxx only 


ERET 


Exception Return 


R4xxx only 


WAIT 


Enter Standby mode 


R4600 only 



Table 1.17 CPO Instructions 
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Data Formats and Addressing 

The R4600/R4700 processor uses four data formats: a 64-bit 
doubleword, a 32-bit word, a 16-bit halfword, and an 8-bit byte. Byte 
ordering within each of the larger data formats — halfword, word, 
doubleword — can be configured in either big-endian or little-endian order. 
Endianness refers to the location of byte within the multi-byte data 
structure. Figures 1.4 and 1.5 show the ordering of bytes within words and 
the ordering of words within multiple-word structures for the big-endian 
and little-endian conventions. 

When the R4000 processor is configured as a big-endian system, byte 
is the most-significant (leftmost) byte, thereby providing compatibility with 
MC 68000* and IBM 370* conventions. Figure 1.4 shows this configuration. 
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Figure 1.4 Big-Endian Byte Ordering 

When configured as a little-endian system, byte is always the least- 
significant (rightmost) byte, which is compatible with iAPX* x86 and DEC 
VAX' conventions. Figure 1.5 shows this configuration. 
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Figure 1.5 Little-Endian Byte Ordering 

In this text, bit is always the least-significant (rightmost) bit; thus, bit 
designations are always little-endian (although no instructions explicitly 
designate bit positions within words). 

Figures 1.6 and 1.7 show little-endian and big-endian byte ordering in 
doublewords. 
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Figure 1.6 Little-Endian Data in a Double word 
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Figure 1.7 Big-Endian Data in a Doubleword 

The CPU uses byte addressing for halfword, word, and doubleword 
accesses with the following alignment constraints: 

• Halfword accesses must be aligned on an even byte boundary 
(0, 2, 4...). 

• Word accesses must be aligned on a byte boundary divisible by four 
(0, 4, 8.. J. 

• Doubleword accesses must be aligned on a byte boundary divisible by 
eight (0, 8, 16...). 

The following special instructions load and store words that are not 
aligned on 4-byte (word) or 8-word (doubleword) boundaries: 

LWL LWR SWL SWR 

LDL LDR SDL SDR 

These instructions are used in pairs to provide addressing of misaligned 
words. Addressing misaligned data incurs one additional instruction cycle 
over that required for addressing aligned data. This extra cycle is because 
of an extra instruction for the "pair" (e.g., LWL and LWR form a pair). Also 
note that the CPU moves the unaligned data at the same rate as a 
hardware mechanism. 
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Figures 1.8 and 1.9 show the access of a misaligned word that has byte 
address 3. 
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Figure 1.8 Big-Endian Misaligned Word Addressing 
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Figure 1.9 Little-Endian Misaligned Word Addressing 

Coprocessors (CP0-CP2) 

The MIPS ISA (MIPS III) for the R4600/R4700 (and R4000/R4400) 
defines three coprocessors (designated CPO through CP2): 

• Coprocessor (CPO) is incorporated on the CPU chip and supports 
the virtual memory system and exception handling. CPO is also re- 
ferred to as the System Control Coprocessor 

• Coprocessor 1 (CP1) is incorporated on the R4600/R4700, and imple- 
ments the MIPS floating-point instruction set. 

• Coprocessor 2 (CP2) is reserved for future use. 
CPO and CP1 are described in the sections that follow. 

System Control Coprocessor, CPO 

CPO translates virtual addresses into physical addresses and manages 
exceptions and transitions between kernel, supervisor, and user states. 
CPO also controls the cache subsystem, as well as providing diagnostic 
control and error recovery facilities. 

CPO is also used to control the power management for the R4600/ 
R4700. This is the standby mode and it can be used to reduce the power 
consumption of the internal core of the CPU. The standby mode is entered 
by executing the WAIT instruction with the SysAD bus idle and is exited by 
any interrupt. This feature is discussed in Appendix G. 
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The CPO registers shown in Figure 1.10 and described in Table 1.18 on 
page 1.17 manipulate the memory management and exception handling 
capabilities of the CPU. 

Note: Access to reserved or undefined CPO register results are unde- 
fined. An exception may or may not result. 
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Figure 1.10 R4600/R4700 CPO Registers 
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Number 


Register 


Description 





Index 


Programmable pointer into TLB array 


1 


Random 


Pseudorandom pointer into TLB array (read only) 


2 


EntryLoO 


Low half of TLB entry for even virtual page (VPN) 


3 


EntryLol 


Low half of TLB entry for odd virtual page (VPN) 


4 


Context 


Pointer to kernel virtual page table entry (PTE) for 32- 
bit address spaces 


5 


PageMask 


TLB Page Mask 


6 


Wired 


Number of wired TLB entries 


7 


— 


Reserved 


8 


BadVAddr 


Bad virtual address 


9 


Count 


Timer Count 


10 


EntryHi 


High half of TLB entry 


11 


Compare 


Timer Compare 


12 


SR 


Status register 


13 


Cause 


Cause of last exception 


14 


EPC 


Exception Program Counter 


15 


PRId 


Processor Revision Identifier 


16 


Config 


Configuration register 


17 


LLAddr 


Load Linked Address 


18- 19 


— 


Reserved 


20 


XContext 


Pointer to kernel virtual PTE table for 64-bit address 
spaces 


21-25 


— 


Reserved 


26 


ECC 


Secondary-cache error checking and correcting (ECC) 
and Primary parity 


27 


CacheErr 


Cache Error and Status register 


28 


TagLo 


Cache Tag register 


29 


TagHl 


Cache Tag register 


30 


ErrorEPC 


Error Exception Program Counter 


31 


— 


Reserved 



Table 1.18 System Control Coprocessor (CPO) Register Definitions 
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Floating-Point Co-Processor 

The R4600/R4700 incorporates an entire floating-point co-processor on 
chip, including a floating-point register file and execution units. The 
floating-point co-processor forms a "seamless" interface with the integer 
unit, decoding and executing instructions in parallel with the integer unit. 
The R4700 enhances the FPA implemented in the original R4600, resulting 
in an improved peak MFLOP rate. 

Floating-Point Units 

The R4600/R4700 floating-point execution units supports single and 
double precision arithmetic, as specified in the IEEE Standard 754. The 
execution unit is broken into a separate multiply unit and a combined 
add/convert/divide/square root unit. Overlap of multiplies and add/ 
subtract is supported. The multiplier is partially pipelined, allowing a new 
multiply to begin every 6 cycles for the R4600, and every 4 cycles for the 
R4700. 

As in the R3010 and R4000, the R4600/R4700 maintains fully precise 
floating-point exceptions while allowing both overlapped and pipelined 
operations. Precise exceptions are extremely important in mission-critical 
environments, such as ADA, and highly desirable for debugging in any 
environment. 

The floating-point unit's operation set includes floating-point add, 
subtract, multiply, divide, square root, conversion between fixed-point and 
floating-point format, conversion among floating-point formats, and 
floating-point compare. These operations comply with the IEEE Standard 
754. 

Table 1.19 shows the latencies of some of the floating-point instructions 
in internal processor cycles. Due to pipelining, repeat rates may be higher. 
Also note that many operations are autonomous and can go in parallel. 



Operation 


Single Precision 


Double Precision 


ADD 


4 


4 


SUB 


4 


4 


MUL 


R4600: 8 
R4700: 4 


R4600: 8 
R4700: 5 


DIV 


32 


61 


SQRT 


31 


60 


CMP 


3 


3 


FIX 


4 


4 


FLOAT 


6 


6 


ABS 


1 


1 


MOV 


1 


1 


NEG 


1 


1 


LWC1,LDC1 


2 


2 


SWC1,SDC1 


1 


1 



Table 1.19 Floating-Point Latency Cycles 
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Virtual to Physical Address Mapping 

The R4600/R4700 provides three modes of operation: 

• user mode 

• supervisor mode 

• kernel mode 

This mechanism is available to system software to provide a secure 
environment for user processes. Bits in a status register determine the 
mode of operation. In the user mode, the R4600/R4700 provides a single, 
uniform virtual address space of 256GB (2GB when Status.UX = 0). 

When operating in the kernel mode, four distinct virtual address spaces, 
totalling 1024GB (4GB when Status.KX = 0), are simultaneously available 
and are differentiated by the high-order bits of the virtual address. 

The R4600/R4700 processors also support a supervisor mode in which 
the virtual address space is 256.5GB (2.5GB when Stauts.SX = 0), divided 
into three regions based on the high-order bits of the virtual address. 

When the R4600/R4700 uses 64-bit virtual addresses, the address 
space layouts are an upward compatible extension of the 32-bit virtual 
address space layout. A detailed description of the addressing is given in 
Chapter 4. 

Joint TLB 

For fast virtual-to-physical address decoding, the R4600/R4700 uses a 
large, fully associative TLB which maps 96 Virtual pages to their 
corresponding physical addresses. The TLB is organized as 48 pairs of 
even-odd entries, and maps a virtual address and address space identifier 
into the large, 64GB physical address space. 

Two mechanisms are provided to assist in controlling the amount of 
mapped space, and the replacement characteristics of various memory 
regions. First, the page size can be configured, on a per-entiy basis, to map 
a page size of 4KB to 16MB (in multiples of 4). A CPO register is loaded with 
the page size of a mapping, and that size is entered into the TLB when a 
new entry is written. Thus, operating systems can provide special purpose 
maps; for example, a typical frame buffer can be memory mapped using 
only one TLB entry. 

The second mechanism controls the replacement algorithm when a TLB 
miss occurs. The R4600/R4700 provides a random replacement algorithm 
to select a TLB entry to be written with a new mapping; however, the 
processor provides a mechanism whereby a system specific number of 
mappings can be locked into the TLB, and thus avoid being randomly 
replaced. This facilitates the design of real-time systems, by allowing 
deterministic access to critical software. 

The joint TLB also contains information to control the cache coherency 
protocol for each page. Specifically, each page has attribute bits to 
determine whether the coherency algorithm is: uncached, non-coherent 
write-back, non-coherent write-through write-allocate, non-coherent 
write-through no write-allocate, sharable, exclusive, or update. Non- 
coherent write-back is typically used for both code and data on the R4600/ 
R4700; the write-through modes support more efficient frame buffer 
accesses than the R4000 family. The coherent modes are supported for 
R4000 compatibility and generate different transaction types on the 
system interface; cache coherency is not supported however. 
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Instruction TLB 

The R4600/R4700 also incorporates a 2-entiy instruction TLB. Each 
entry maps a 4KB page. The instruction TLB improves performance by 
allowing instruction address translation to occur in parallel with data 
address translation. When a miss occurs on an instruction address 
translation, the least-recently used ITLB entry is filled from the JTLB. The 
operation of the ITLB is invisible to the user. 

Data TLB 

The R4600/R4700 also incorporates a 4-entiy data TLB. Each entry 
maps a 4KB page. The data TLB improves performance by allowing data 
address translation to occur in parallel with data address translation. 
When amiss occurs on an data address translation, the DTLB is filled from 
the JTLB. The DTLB refill is pseudo-LRU: the least recently used entry of 
the least recently used half is filled. The operation of the DTLB is invisible 
to the user. 

Cache Memory 

In order to keep the R4600/R4700 , s high-performance pipeline full and 
operating efficiently, the R4600/R4700 incorporates on-chip instruction 
and data caches that can be accessed in a single processor cycle. Each 
cache has its own 64-bit data path and can be accessed in parallel. The 
cache subsystem provides the integer and floating-point units with an 
aggregate bandwidth of 1.6GB per second at a system clock frequency of 
50MHz. 

Furthermore, the large, Two-way set associative caches increase 
emulation performance of DOS and Windows 3.1 applications when 
running under Windows NT. 

Instruction Cache 

The R4600/R4700 incorporates a two-way set associative on-chip 
instruction cache. This virtually indexed, physically tagged cache is 16KB 
in size and is protected with word parity. 

Because the cache is virtually indexed, the virtual-to-physical address 
translation occurs in parallel with the cache access, thus further 
increasing performance by allowing these two operations to occur 
simultaneously. The tag holds a 24-bit physical address and valid bit, and 
is parity protected. 

The instruction cache is 64-bits wide, and can be refilled or accessed in 
a single processor cycle. Instruction fetches require only 32 bits per cycle, 
for a peak instruction bandwidth of 700 MB/sec @ 175MHz. Sequential 
accesses take advantage of the 64-bit fetch to reduce power dissipation, 
and cache miss refill writes 64 bits per cycle to minimize the cache miss 
penalty. The line size is eight instructions (32 bytes) to maximize 
performance. 

Data Cache 

For fast, single cycle data access, the R4600/R4700 includes a 16KB on- 
chip data cache that is two-way set associative with a fixed 32-byte (eight 
words) line size. Both the D-cache and the I-cache can be accessed each 
pipeline cycle; thus, the data bandwidth is 1400 MB/sec @ 175 MHz, in 
addition to the 700 MB/sec instruction bandwidth. 

The data cache is protected with byte parity and its tag is protected with 
a single parity bit. It is virtually indexed and physically tagged to allow 
simultaneous address translation and data cache access 
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The normal write policy is writeback, which means that a store to a cache 
line does not immediately cause memory to be updated. This increases 
system performance by reducing bus traffic and eliminating the bottleneck 
of waiting for each store operation to finish before issuing a subsequent 
memory operation. Software can however select write-through on a per- 
page basis when it is appropriate, such as for frame buffers. 

Associated with the Data Cache is the store buffer. When the R4600/ 
R4700 executes a Store instruction, this single-entry buffer gets written 
with the store data while the tag comparison is performed. If the tag 
matches, then the data is written into the Data Cache in the next cycle that 
the Data Cache is not accessed (the next non-load cycle). The store buffer 
allows the R4600/R4700 to execute a store every processor cycle and to 
perform back-to-back stores without penalty. 

Write buffer 

Writes to external memory, whether cache miss writebacks or stores to 
uncached or write-through addresses, use the on-chip write buffer. The 
write buffer holds up to four 64-bit address and data pairs or 1 cache line 
to be written back. The entire buffer is used for a data cache writeback and 
allows the processor to proceed in parallel with memory update. For 
uncached and write-through stores, the write buffer significantly increases 
performance over the R4000 family of processors. 

R4600/R4700 Clocks 

The R4600/R4700 has a number of clocks for the user. First, there is 
the pipeline clock, PClock. This clock is used for the pipeline and pipeline 
related functions internal to the R4600/R4700. It is two times the 
MasterClock frequency. The next clock is the system interface clock, 
SClock. This is also an internal clock and is used to sample data at the 
system interface and to clock data into the processor system interface 
output registers. The SClock is a divided version of the PClock. The divisor 
is selected at boot time. 

There are three external clocks. (Some outputs are replicated to minimize 
loading.) The MasterOut is at the same frequency as MasterClock and can 
be used to clock certain external logic. The other clocks are used by the 
external agent. These are the TClock, Transmit clock, and the RClock, 
Receive clock. The TClock is used to clock the output registers (signals 
transmitted to the R4600/R4700) of the external agent and is at the same 
frequency as SClock. The RClock is used to clock the input register (signals 
received from the R4600/R4700) of the external agent. It is also at the 
same frequency as the SClock but its phase leads the SClock and TClock 
by 25%. The R4600/R4700 implements an on-chip PLL to eliminate the 
effects of clock skew. 
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System Interface 

The R4600/R4700 supports a 64-bit system interface that is compatible 
with the R4000PC system interface. This interface operates from two 
clocks provided by the R4600/R4700, TClock[l:0] and RClock[l:0], at a 
division of the pipeline clock. 

The interface consists of a 64-bit Address/Data bus with 8 check bits 
and a 9-bit command bus. In addition, there are 8 handshake signals and 
6 interrupt inputs. The interface has a simple timing ^specification and is 
capable of transferring data between the processor and memory at a peak 
rate of 400MB/sec at 50MHz. 

Figure 1.11 shows a typical system using the R4600/R4700. In this 
example there is DRAM, a boot EPROM and an optional secondary cache. 
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Figure 1.11 Typical System Block Diagram 
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Comparison of R4600/R4700 and R4400 

This section compares features of the R4600/R4700 to the earlier R4400 
PC, Table 1.20 to Table 1.26 highlight some of the differences between the 
R4600/R4700 and the R4400 PC. This list is not exhaustive. 



Item 


R4400PC 


R4600/R4700 


I/O 


R4400: TTL compatible 
RV4400: LV CMOS 


R4600/R4700: TTL-compatible (5V±0.5%) 
RV4600/RV4700: LVCMOS (3.3V±0.3V) 


Package 


179-pin ceramic PGA 


same and 208-pin MQUAD 


JTAG 


yes 


no (serial out connected directly to serial in) 


Block transfer sizes 


16B or 32B 


32B 


Sclock divisor 


2, 3, 4, 6, 8 


2, 3, 4, 5, 6, 7, 8 


Non-block writes 


max throughput of 4 sclock cycles 


two new system interface protocol options 
that support 2 sclock cycle throughput 
(remains 4 in compatibility mode) 


Serial configuration 


as described in R4000 User's Guide 


different, as described in Table 9.2 on 
page 9-7 


Address bits 63.. 56 on reads and 
writes 


zero 


bits 19.. 12 of virtual address 


Uncached and write-through 
stores 


uncached stores are buffered in 1- 
entry uncached store buffer (write 
through not possible) 


uncached and write-though stores buffered 
in 4-entry write buffer 


SysADC 


parity only 


same 


SysADC for non-data cycles 


parity 


zero 


SysCmdP 


parity 


zero 


Parity error during writeback 


use Cache Error exception 


output bad parity 


Error bit in data identifier of 
read responses 


Bus Error if error bit set for any dou- 
bleword 


only check error bit of first doubleword; all 
other error bits ignored 


Parity error on read data 


Bus Error if parity error in any dou- 
bleword 


bad parity written to cache; take Cache 
Error exception if bad parity occurs on dou- 
blewords that the processor is waiting for 


Block writes 


1-2 null cycles between address and 
data 


cycles between address and data 


Release after Read Request 


variable latency 


latency 


SysAD value for x cycles of write- 
back data pattern 


data bus undefined 


data bus maintains last D cycle value 


SysAD bus use after last D cycle 
of writeback 


data bus undefined 


trailing x cycles (e.g. DDxxDDxx, not 
DDxxDD) follow rule in entry immediately 
preceding 


Output slew rate 


dynamic feedback control 


simple CMOS output buffers with 2-bit 
static strength control 


IOOut output 


output slew rate control feedback 
loop output 


driven HIGH, do not connect 
(reserved for future output) 


IOIn input 


output slew rate control input 


should be driven high 
(reserved for future input) 


GrpRunB output 


do not connect 


same 

(reserved for future output) 


GrpStallB input 


should be connected to VCC 


same 

(reserved for future input) 


FaultB output pin 


indicates compare mismatch 


driven HIGH, do not connect 
(reserved for future output) 



Table 1.20 System Interface Comparison Between R4400 PC and R4600/R4700 
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Item 


R4400 PC 


R4600/R4700 


Cache Sizes 


16KB Instruction cache, 16KB Data 
cache 


16KB Instruction cache, 16KB Data 
cache 


Cache Line Sizes 


software selectable between 16B and 
32B 


fixed at 32B 


Cache Index 


vAddr 13 . 


vAddr 12 .. 


Cache Tag 


pAddr 3 5 4l2 


same 


Cache Organization 


direct mapped 


2-way set associative 


Data cache write policy 


write-allocate and write-back 


write-allocate or not based on TLB 
entry, write-through or not based on 
TLB entry 


Data cache miss 


stall, output address, copy dirty data to 
writeback buffer, refill cache, output 
writeback data 


same, with FIFO to select the set to 
refill 


Data order for block 
reads 


sub-block ordering 


same 


Data order for block 
writes 


sequential 


same 


Instruction cache miss 
restart 


restart after all data received and writ- 
ten to cache 


same 


Data cache miss restart 


restart after all data received and writ- 
ten to cache 


restart on first doubleword, send sub- 
sequent doublewords to response 
buffer 


Instruction Tag 


2-bit cache state 


1-bit cache state 


Cache miss overhead 


5-8 cycles 


3 cycles 


Instruction cache parity 


1 parity bit per 8 data bits 


1 parity bit per 32 data bits 


Data cache parity 


1 parity bit per 8 data bits 


same 



Table 1.21 Cache Comparison Between R4400 PC and R4600/R4700 
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Item 


R4400 PC 


R4600/R4700 


Instruction virtual 
address translation 


2-entry ITLB 


same 


ITLB miss 


1 cycle penalty, refilled from JTLB, 
LRU replacement 


1 cycle on branch, jump, and ERET, 2 
cycles otherwise, refilled from JTLB, 
LRU replacement 


Data virtual address 
translation 


done directly In JTLB 


4-entry DTLB 


DTLB miss 


n.a. 


1 cycle penalty, refilled from JTLB, 
pseudo-LRU replacement 


JTLB 


48 entries of even/odd page pairs, fully 
associative 


same 


Page size 


4KB, 16KB, ..., 16MB 


same 


Multiple entry match 
in JTLB 


sets TS in Status and disables TLB 
until Reset to prevent damage 


no damage for multiple match; no 
detection or shutdown implemented 


Virtual address size 


VSIZE = 40 


same 


Physical address size 


PSIZE = 36 


same 



Table 1.22 TLB Comparison Between R4400 PC and R4600/R4700 



Item 


R4400 PC 


R4600/R4700 


ALU latency 


1 cycle 


1 cycle 


Load latency 


3 cycles 


2 cycles 


Branch latency 


4 cycles (2 cycle penalty for taken 
branches) 


2 cycles (no penalty for taken 
branches) 


Store buffer (not write 
buffer) 


2 doublewords 


1 doubleword 


Integer multiply 


integer multiply hardware, 1 cycle to 
issue 


done in floating-point multiplier, 4 
cycles to issue 


Integer divide 


done in Integer datapath adder, slips 
until done 


done in floating-point adder, 4 cycles to 
issue 


Integer multiply 


HIGH and LOW available at the same 
time 


LOW available one cycle before HIGH 


Integer divide 


HIGH and LOW available at the same 
time 


HIGH available one cycle before LOW 


HIGH and LOW hazards 


yes, HIGH and LOW written early in 
pipeline 


no, HIGH and LOW written after W 


MFHI/MFLO latency 


1 cycle 


2 cycles 


SLLV, SRLV, SRAV 


2 cycles to issue 


1 cycle to issue 


DSLL, DSRL, DSRA, 
DSLL32, DSRL32, 
DSRA32, DSLLV, 
DSRLV, DSRAV 


2 cycles to issue 


1 cycle to issue 



Table 1.23 Pipeline Comparison Between R4400 PC and R4600/R47O0 
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Item 


R4400 PC 


R4600/R4700 


WatchLo, WatchHi 


implemented 


unimplemented (no watch registers) 


Config 


as described in R4000 User's Guide 


subset 


Status 


as described in R4000 User's Guide, 
but RP not functional 


no TS or RP 


Low-power standby 
mode 


no 


WAIT instruction disables internal 
clock, freezing pipeline and other state; 
resume on interrupt 


MFCO/MTCO hazard 


only hazardous for certain cpO register 
combinations 


always hazardous — detected and 1- 
cycle slip inserted 


EntryLoO, EntryLol 


as described in R4000 User's Guide 


two new cache algorithms added to C 
field for non-coherent write-through 


TagLo, TagHi, ECC, 
CacheErr 


R4400SC bits implemented but mean- 
ingless 


Only bits meaningful on R4400 PC 
implemented 


TagLo 


as described in R4000 User's Guide 


bits 5.. 3 read/writeable but otherwise 
unused, bit 2 used for F bit 


Exceptions 


as described in R4000 User's Guide 
(VCEI and VCED not possible) 


VCEI, VCED, and WATCH exceptions 
not implemented 


Index CACHE ops 
I Fill CACHE op 


use vAddr 13 4 to select line 


use vAddr 13 to select set, vAddr 12 ..5 to 
select line of set 


Index Store Tag CACHE 
op 


Status.CE ignored 


TagLo.P stored if Status.CE set 


PRId 


Imp = 0x04 


R4600: Imp = 0x20 
R4700: Imp = 0x21 



Table 1.24 Coprocessor Comparison Between R4400 PC and R4600/R4700 



Item 


R4400 PC 


R4600/R4700 


Possible exception stall 


only for operands that can cause 
exceptions 


some simplifications in detection hard- 
ware 


Floating-point divide 


separate divide unit 


done in floating-point adder 


Floating-point square 
root 


done in floating-point adder 


same 


Converts to/from 64-bit 
integer 


uses unimplemented for integer oper- 
ands/results with more than 53 bits of 
precision 


handles full 64-bit operands and 
results 


Floating-point registers 


Status.FR enables all 32 floating point 
registers 


same 


FCR0 


Imp = 0x05 


R4600: Imp = 0x20 
R4700: Imp = 0x21 



Table 1.25 Coprocessor 1 Comparison Between R4400 PC and R4600/R4700 
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Introduction 

This chapter is an overview of the central processing unit (CPU) 
instruction set; refer to Appendix A for detailed descriptions of individual 
CPU instructions. 

An overview of the floating-point unit (FPU) instruction set is in 
Chapter 6; refer to Appendix B for detailed descriptions of individual FPU 
instructions. 

CPU Instruction Formats 

Each CPU instruction consists of a single 32-bit word, aligned on a word 
boundary. There are three instruction formats — immediate (I -type), jump 
(J-type), and register (R-type) — as shown in Figure 2. 1. The use of a small 
number of instruction formats simplifies instruction decoding (thus higher 
frequency operations) and allowing the compiler to synthesize more 
complicated (and less frequently used) operations and addressing modes 
from these three formats as needed. 
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6-bit operation code 

5-bit source register specifier 

5-bit target (source/destination) register or branch condition 

16-bit immediate value, branch displacement or address 

displacement 

26-bit jump target address 

5-bit destination register specifier 

5-bit shift amount 

6-bit function field 



Figure 2. 1 CPU Instruction Formats 

In the MIPS architecture, coprocessor instructions are implementation- 
dependent; refer to Appendix A for details of individual Coprocessor 
instructions. 
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Load and Store Instructions 

Load and store are immediate (I-type) instructions that move data 
between memory and the general registers. The only addressing mode that 
load and store instructions directly support is base register plus 16-bit 
signed immediate offset 

Scheduling a Load Delay Slot 

A load instruction that does not allow its result to be used by the 
instruction immediately following is called a delayed load instructiorh The 
instruction slot immediately following this delayed load instruction is 
referred to as the load delay slot 

In the R4600/R4700 processor, the instruction immediately following a 
load instruction can request the contents of the loaded register, however, 
in such cases, hardware interlocks insert additional real cycles. 
Consequently, scheduling load delay slots can be desirable, both for 
performance and R-Series (e.g., R3051) processor compatibility. However, 
the scheduling of load delay slots is not absolutely required. 

Defining Access Types 

Access type indicates the size of an R4600/R4700 processor data item 
to be loaded or stored, set by the load or store instruction opcode. Access 
types are defined in Appendix A. 

Regardless of access type or byte ordering (endianness), the address 
given specifies the low-order byte in the addressed field. For a big-endian 
configuration, the low-order byte is the most-significant byte; for a little- 
endian configuration, the low-order byte is the least-significant byte. 

The access type, together with the three low-order bits of the address, 
define the bytes accessed within the addressed doubleword, which is 
shown in Table 2. 1 on page 2-3. 
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Only the combinations shown in Table 2.1 are permissible; other 
combinations cause address error exceptions. See Appendix A for 
individual descriptions of CPU load and store instructions. 
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Table 2.1 Byte Access within a Doubleword 
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Computational Instructions 

Computational instructions can be either: 1) in register (R-type) format, 
in which both operands are registers, or 2) in immediate (I-type) format, in 
which one operand is a 16-bit immediate. 

Computational instructions perform the following operations on register 
values: 

• arithmetic 

• logical 

• shift 

• multiply 

• divide 

These operations fit in the following four categories of computational 
instructions: 

• ALU Immediate instructions 

• three-Operand Register-Type instructions 

• shift instructions 

• multiply and divide instructions 

64-bit Virtual Address Operations with 32-bit operands 

Operands to 32-bit operand opcodes must be in sign-extended form. 32- 
bit operand opcodes include all non-doubleword operations, such as: ADD, 
ADDU, SUB, SUBU, ADDI, SLL, SRL, SRA, SLLV, etc. The result of 
operations that use incorrect sign-extended 32-bit values is unpredictable. 

Cycle Timing for Multiply and Divide Instructions 

MFHI and MFLO instructions (described in Appendix A) are interlocked 
so that any attempt to read them before prior multiply or divide 
instructions complete delays the execution of these instructions until the 
prior instructions finish. 

Table 2.2 gives the number of processor cycles (PCycles) required to 
resolve an interlock or stall between various multiply or divide 
instructions, and a subsequent MFHI or MFLO instruction. 
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Table 2.2 Multiply/Divide Instruction Cycle Timing 

For more information about computational instructions, refer to the 
individual instruction as described in Appendix A. 
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Jump and Branch Instructions 

Jump and branch instructions change the control flow of a program. All 
jump and branch instructions occur with a delay of one instruction: that 
is, the instruction immediately following the jump or branch (this is known 
as the instruction in the delay slot} always executes while the target 
instruction is being fetched from storage. 

Overview of Jump Instructions 

Subroutine calls in high-level languages are usually implemented with 
Jump or Jump and Link instructions, both of which are J-type 
instructions. In J-type format, the 26-bit target address shifts left 2 bits 
and combines with the high-order 4 bits of the current program counter to 
form an absolute address. 

Returns, dispatches, and large cross-page jumps aire usually 
implemented with the Jump Register or Jump and Link Register 
instructions. Both are R-type instructions that take the 32-bit or 64-bit 
byte address contained in one of the general purpose registers. 

For more information about jump instructions, refer to the individual 
instruction as described in Appendix A. 

Overview of Branch Instructions 

All branch instruction target addresses are computed by adding the 
address of the instruction in the delay slot to the 16-bit offset (shifts left 
2 bits and is sign-extended to 32 bits). All branches occur with a delay of 
one instruction. 

If a conditional branch likely is not taken, the instruction in the delay 
slot is nullified. For regular conditional branches, the delay slot is always 
executed. 

For more information about branch instructions, refer to the individual 
instruction as described in Appendix A. 

Special Instructions 

Special instructions allow the software to initiate traps; they are always 
R-type. For more information about special instructions, refer to the 
individual instruction as described in Appendix A. 

Exception Instructions 

Exception instructions are extensions to the MIPS ISA. For more 
information about exception instructions, refer to the individual 
instruction as described in Appendix A. 

Coprocessor Instructions 

Coprocessor instructions perform operations in their respective 
coprocessors. Coprocessor loads and stores are I-type, and coprocessor 
computational instructions have coprocessor-dependent formats. 

Individual coprocessor instructions are described in Appendices A (for 
CPO) and B (for the FPU, CP1). 

CPO instructions perform operations specifically on the System Control 
Coprocessor registers to manipulate the memory management and 
exception handling facilities of the processor. Appendix A contains details 
of the CPO instructions. 
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Introduction 

This chapter describes the basic operation of the CPU pipeline, which 
includes descriptions of the delay instructions (instructions that follow a 
branch or load instruction in the pipeline), interruptions to the pipeline 
flow caused by interlocks and exceptions, and R4600/R4700 
implementation of an uncached store buffer. The FPU pipeline is described 
in a later chapter. 

CPU Pipeline Operation 

The R4600/R4700 uses a 5-stage pipeline similar to the R3000. The 
simplicity of this pipeline allows the R4600/R4700 to be lower cost and 
lower power than super-scalar or super-pipelined processors. Unlike the 
R3000, the R4600/R4700 does virtual to physical translation in parallel 
with cache access. This allows the R4600/R4700 to operate at over twice 
the frequency of the R3000 and to support a larger TLB for address 
translation. 

Compared to the 8-stage R4000 pipeline, the R4600/R4700 is more 
efficient (requires fewer stalls). 

Once the pipeline has been filled, five instructions are executed 
simultaneously. Figure 3.1 shows the five stages of the instruction 
pipeline; the next section describes the pipeline stages. 
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Figure 3. 1 Instruction Pipeline Stages 
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CPU Pipeline Stages 

This section describes each of the phases of the five pipeline stages. 
Each stage has 2 phases: 

II - Instruction Fetch, Phase one 
21 - Instruction Fetch, Phase two 
1R - Register Fetch, Phase one 
2R - Register Fetch, Phase two 
1A - Execution, Phase one 
2A - Execution, Phase two 
ID - Data Fetch, Phase one 
2D - Data Fetch, Phase two 
1W - Write Back, Phase one 
2W - Write Back, Phase two 

II - Instruction Fetch, Phase one 

During the II phase the instruction address translation begins in the 
ITLB. 

21 - Instruction Fetch, Phase two 

During the 21 phase, the instruction cache fetch begins and the 
instruction address translation in the ITLB continues. 

1R - Register Fetch, Phase one 

During the 1R phase, the following occurs: 

• The instruction cache fetch finishes. 

• The instruction cache tag is checked against the page frame number 
obtained from the ITLB. 

2R - Register Fetch, Phase two 

During the 2R phase, the following occurs: 

• The instruction decoder decodes the instruction. 

• Any required operands are fetched from the register file. 

• Make a decision to either issue or slip (for an interlock condition). 

• For a branch, the branch address is calculated. 

1A - Execution, Phase one 

During the 1 A phase, one of the following occurs: 

• Any result from the A or D stages are bypassed. 

• The arithmetic logic unit (ALU) starts the integer arithmetic, logical or 
shift operation. 

• The ALU calculates the data virtual address for load and store in- 
structions. 

• The ALU determines whether the branch condition is true. 

2A - Execution, Phase two 

During the 2A phase, one of the following occurs: 

• The integer arithmetic, logical or shift operation will complete. 

• A data cache access will start. 

• Store data is shifted to the specified byte position(s). 

• The data virtual to physical address translation in the DTLB will start. 

ID - Data Fetch, Phase one 

During the ID phase, one of the following occurs: 

• The data cache access will continue. 

• The data address translation in the DTLB completes. 

• The virtual to physical address translation in the JTLB will start. 
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2D - Data Fetch, Phase two 

During the 2D phase, one of the following occurs: 

• The data cache access will finish and the data is shifted down and ex- 
tended. 

• The virtual to physical address translation in the JTLB will finish. 
The data cache tag is checked against the PFN from the DTLB or JTLB 
for any data cache access. 

1W - Write Back, Phase one 

This phase is used internally by the processor to resolve all exceptions, 
in preparation for the register file write. 

2W - Write Back, Phase two 

For register-to-register and load instructions, the result is written back 
to the register file during the 2W stage. Branch instructions perform no 
operation during this stage. 

Figure 3.2 shows the activities occurring during each ALU pipeline 
stage, for load, store, and branch instructions. 
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Figure 3.2 CPU Pipeline Activities 
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Branch Delay 

The CPU pipeline has a branch delay of one cycle and a load delay of one 
cycle. The one-cycle branch delay is a result of the branch decision logic 
operating during the 1A pipeline phase of the branch instruction. This 
allows the branch target address calculated in the previous phase to be 
used for the instruction access in the following II phase. The pipeline will 
begin the fetch of the branch path as well as the fall-through path in the 
cycle following the delay slot. After the branch decision is made, the 
processor will continue with the fetch of either the branch path (for a taken 
branch) or the fall-through path (for the non-taken branch). 

Figure 3.3 illustrates the branch delay. 
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Figure 3.3 CPU Pipeline Branch Delay 



Load Delay 

The completion of a load at the end of the 2D pipeline phase produces 
an operand that is available for the 1A pipeline phase of the instruction 
following the load delay slot. 

Figure 3.4 shows the load delay of one pipeline cycle. 
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Figure 3.4 CPU Pipeline Load Delay 
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Interlock and Exception Handling 

Smooth pipeline flow is interrupted when cache misses or exceptions 
occur, or when data dependencies are detected. Interruptions handled 
using hardware, such as cache misses, are referred to as interlocks, while 
those that are handled using software are called exceptions. 

There are two types of interlocks: 

• stalls, which are resolved by halting the pipeline 

• slips, which require the back end of the pipeline to advance while the 
front end of the pipeline is held static 

At each cycle, exception and interlock conditions are checked for all 
active instructions. 

Because each exception or interlock condition corresponds to a 
particular pipeline stage, a condition can be traced back to the particular 
instruction in the exception/interlock stage, as shown in Figure 3.5. For 
instance, a Reserved Instruction (RI) exception is raised in the execution 
(A) stage. 
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Figure 3.5 Correspondence of Pipeline Stage to Interlock Condition 

For a description of the pipeline interlocks and exceptions listed in 
Figure 3.5, refer to Table 3. 1 and Table 3.2, which follow. 
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Table 3. 1 and Table 3.2 describe the pipeline interlocks and exceptions 
listed in Figure 3.5. 



Exception 


Description 


ITLB 


Instruction Translation or Address Exception 


Intr 


External Interrupt 


IBE 


Instruction Bus Error 


RI 


Reserved Instruction 


BP 


Breakpoint 


SC 


System Call 


CUn 


Coprocessor Unusable 


IPErr 


Instruction Parity Error 


OVF 


Integer Overflow 


FPE 


FP Interrupt 


ExTrap 


EX Stage Traps 


DTLB 


Data Translation or Address Exception 


TLBMod 


TLB Modified 


DBE 


Data Bus Error 


DPErr 


Data Parity Error 


NMI 


Non-maskable Interrupt (or Soft Reset) 


Reset 


Reset 



Table 3. 1 Pipeline Exceptions 



Interlock 


Description 


ITM 


Instruction TLB Miss 


ICM 


Instruction Cache Miss 


CPE 


Coprocessor Possible Exception 


DCM 


Data Cache Miss 


LDI 


Load Interlock 


MDSt 


Multiply/Divide Start 


FCBsy 


FP Coprocessor Busy 



Table 3.2 Pipeline Interlocks 

Exception Conditions 

When an exception condition occurs, the relevant instruction and all 
those that follow it in the pipeline are cancelled* Accordingly, any stall 
conditions and any later exception conditions that may have referenced 
this instruction are inhibited; there is no benefit in servicing stalls for a 
cancelled instruction. 
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When an exceptional condition is detected for an instruction, the 
R4600/R4700 will kill it and all following instructions. When this 
instruction reaches the W stage, the exception flag causes it to write 
various CPO registers with the exception state, change the current PC to 
the appropriate exception vector address and clear the exception bits of 
earlier pipeline stages. 

This implementation allows all preceding instructions to complete 
execution and prevents all subsequent instructions from completing. Thus 
the value in the EPC is sufficient to restart execution. It also ensures that 
exceptions are taken in the order of execution; an instruction taking an 
exception may itself be killed by an instruction further down the pipeline 
that takes an exception in a later cycle. 

Figure 3.6 shows the exception detection procedure (e.g., a reserved 
instruction exception). 
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Figure 3.6 Exception Detection 



Stall Conditions 

Stalls are used to stop the pipeline for conditions detected after the R 
pipe-stage. When a stall occurs, the processor will resolve the condition 
and then the pipeline will continue. 
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Figure 3.7 shows a data cache miss stall. 
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Figure 3.7 Data Cache Miss 

The data cache miss is detected in the D pipe stage. If the cache line to 
be replaced is dirty — the W bit is set — the data is moved to the internal 
write buffer in the next cycle. The first doubleword of data is returned to 
the cache in 3 and the pipeline will then restart. The remainder of the 
cache line is returned in the subsequent cycles. The data to be written 
back will be returned to memory some time after the entire new cache line 
is returned. 

Slip Conditions 

During the 2R and 1 A pipe-stages, internal logic will determine whether 
it is possible to start the current instruction in this cycle. If all of the source 
operands are available (either from the register file or via the internal 
bypass logic) and all the hardware resources necessary to complete the 
instruction will be available at the necessary time(s), then the instruction 
"issues"; otherwise, the instruction will "slip". Slipped instructions are 
retried on subsequent cycles until they issue. The backend of the pipeline 
(stages D and W) will advance normally during slips in an attempt to 
resolve the conflict. "NOPS" will be inserted into the bubble in the pipeline. 
Instructions killed by branch likely instructions, ERET or exceptions will 
not cause slips. 
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Figure 3.8 shows an instruction cache miss. 
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Figure 3.8 Instruction cache miss 

Instruction cache misses are detected in R as shown in Figure 3.8 and 
the pipeline slips in its A stage. There can never be a writeback required 
for an instruction cache miss since dirty data can never exist in the I 
cache. Writes are not allowed to the I cache. Note that early restart is not 
employed for instruction cache misses, the requested cache line will be 
loaded into the cache in its entirety and, after that, the pipeline will restart. 

R4600/R4700 Write Buffer 

The R4600/R4700 contains a write buffer to improve the performance 
of writes to the external memory. Writes to external memory, whether 
cache miss writebacks or stores to uncached or write-through addresses, 
use this on-chip write buffer. The write buffer holds up to four 64-bit 
address and data pairs. 

For a cache miss write-back, the entire buffer is used for the write-back 
data and allows the processor to proceed in parallel with the memory 
update. For uncached and write-through stores, the write buffer 
uncouples the CPU from the write to memory allowing increased 
performance over the R4000 family of processors. If the write buffer is full, 
additional stores will stall until there is room for them in the write buffer. 
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The R4600/R4700 processor provides a full-featured memory 
management unit (MMU) which uses an on-chip Translation Lookaside 
Buffer (TLB) to translate virtual addresses into physical addresses. 

This chapter describes the processor virtual and physical address 
spaces, the virtual-to-physical address translation, the operation of the 
TLB in making these translations, and those System Control Coprocessor 
(CPO) registers that provide the software interface to the TLB. 

Translation Lookaside Buffer (TLB) 

Mapped virtual addresses are translated into physical addresses using 
an on-chip TLB. l The TLB is a fully associative memory that holds 48 
entries, which provide mapping to 48 odd/even page pairs (96 pages). 
When address mapping is indicated, each TLB entiy is checked 
simultaneously for a match with the virtual address that is extended with 
an ASID stored in the EntryHi register. 

The address mapped to a page ranges in size from 4Kbytes to 16Mbytes, 
in multiples of 4— that is, 4K, 16K, 64K, 256K, 1M, 4M, 16M. 

Hits and Misses 

If there is a virtual address match, or hit, in the TLB, the physical page 
number is extracted from the TLB and concatenated with the offset to form 
the physical address (see Figure 4.1). 

If no match occurs (TLB miss), an exception is taken and software refills 
the TLB from the page table resident in memory. Software can write over 
a selected TLB entry or use a hardware mechanism to write into a random 
entry. 

Multiple Matches 

The R4600/R4700 does not provide any detection or shutdown 
mechanism for multiple matches in the TLB. There is no damage possible 
from this condition. The result is undefined for this condition. Software is 
expected never to allow this to occur. 

Address Spaces 

This section describes the virtual and physical address spaces and the 
manner in which virtual addresses are converted or "translated'' into 
physical addresses in the TLB. 

Virtual Address Space 

The processor virtual address can be either 32- or 64-bits wide, 
depending on mode of operation (user, supervisor or kernel) and the 
setting of the corresponding extended address bit in the Status register 
(UX, SX and KXJ. 

• For the extended address bit = 0, addresses are 32-bits wide. 

• For the extended address bit = 1, addresses are 64-bits wide. 

Both 32-bit and 64-bit address wrap in the same way. For example, in 
64-bit mode Oxffffffffffffffff will wrap to 0x0000000000000000. While the 
R4400 slipped on shift of >32-bit or other shift variables, the R4600/ 
R4700 does not. 



1 * There are virtual-to-physical address translations that occur outside of the TLB . 
For example, addresses in ksegO and ksegl spaces are unmapped translations. In 
these spaces the physical address is 0x0000 0000 II VA[28:0] 
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Figure 4. 1 shows the translation of a virtual address into a physical 
address. 



1. Virtual address (VA) represented by the 
virtual page number (VPN) is compared 
with tag in TLB. 



Virtual address 



2. If there is a match, the page frame 
number (PFN) representing the upper 
bits of the physical address (PA) is 
output from the TLB. 




ASID 



VPN 



1 



Offset 



ASID 



VPN 



PFN 



The Offset, which does not pass through 
the TLB, is then concatenated to the PFN. 





TLB 
Entry 



PFN 



] 



Offset 



Physical address 



Figure 4. 1 Overview of a Virtual-to-Physical Address Translation 

As shown in Figure 4.2 and Figure 4.3, the virtual address is extended 
with an 8-bit address space identifier (ASID), which reduces the frequency 
of TLB flushing when switching contexts. This 8-bit ASID is in the CPO 
EntryHi register, described later in this chapter. The Global bit (G) is in the 
EntryLoO and EntryLol registers, described later in this chapter. 

Physical Address Space 

Using a 36-bit address, the processor physical address space 
encompasses 64Gigabytes. The section following describes the translation 
of a virtual address to a physical address. 

Virtual-to-Physical Address Translation 

Converting a virtual address to a physical address begins by comparing 
the virtual address from the processor with the virtual address in the TLB; 
there is a match when the virtual page number (VPN) of the address is the 
same as the VPN field of the entry, and either: 

• the Global (G) bit of the TLB entry is set, or 

• the ASID field of the virtual address is the same as the ASID field of 
the TLB entry. 

This match is referred to as a TLB hit If there is no match, a TLB Miss 
exception is taken by the processor and software is allowed to refill the TLB 
from a page table of virtual/physical addresses in memory. 

If there is a virtual address match in the TLB, the physical address is 
output from the TLB and concatenated with the Offset, which represents 
an address within the page frame space. The Offset does not pass through 
the TLB. 

Virtual-to-physical translation is described in greater detail throughout 
the remainder of this chapter; Figure 4. 19 on page 22 is a flow diagram of 
the process. 

The next two sections describe the 32-bit and 64-bit address 
translations. 
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32-bit Virtual Address Translation 

Figure 4.2 shows the virtual-to-physical-address translation of a 32-bit 
virtual address. 

• The top portion of Figure 4.2 shows a virtual address with a 12-bit, or 
4Kbyte, page size, labelled Offset The remaining 20 bits of the ad- 
dress represent the VPN, and index the lM-entiy page table. 

• The bottom portion of Figure 4.2 shows a virtual address with a 24- 
bit, or 16Mbyte, page size, labelled Offset The remaining 8 bits of the 
address represent the VPN, and index the 256-entry page table. 



39 



Virtual Address with 1M (2 20 ) 4-Kbyte pages 

32 31 29 28 2 Q bits = 1M pages 12 11 



ASID 




VPN 



58" 



Offset 



^A, 



ir 



j 



Bits 31 , 30 and 29 of the virtual 
address select user, supervisor, 
or kernel address spaces. 



Virtual-to-physical 
translation in TLB 
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^ 36-bit Physical Address 



"Y 

Offset passed 
unchanged to 
physical 
Imemory 
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Q 



Offset 
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translation in TLB 



ne 



A Offset passed 
unchanged to 
-* physical 
memory 



~\r~ 



24 23 



ASID 



vm 



8 



Offset 



24 



8 bits = 256 pages 

Virtual Address with 256 (2 8 )16-Mbyte pages 



Figure 4.2 32-bit Virtual Address Translation 

64-bit Virtual Address Translation 

Figure 4.3 on page 4 shows the virtual-to-physical-address translation 
of a 64-bit virtual address. This figure illustrates the two extremes in the 
range of possible page sizes: a 4Kbyte page (12 bits) and a 16Mbyte page 
(24 bits). 

• The top portion of Figure 4.3 shows a virtual address with a 
12-bit, or 4Kbyte, page size, labelled Offset The remaining 28 bits of 
the address represent the VPN, and index the 256M-entry page table. 

• The bottom portion of Figure 4.3 shows a virtual address with a 24- 
bit, or 16Mbyte, page size, labelled Offset The remaining 16 bits of 
the address represent the VPN, and index the 64K-entiy page table. 
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36-bit Physical Address 



^Offset passed 
unchanged to 
physical 

# memory 



35 







71 




PFN 



Offset 



Virtual-to-physical f 
translation in TLB 



msr 



^Offset passed 
unchanged to 
physical 
memory 



A. 



^^z 



Virtual Address with 64K (2 16 )16-Mbyte pages 

40 39 24 23 



ASID 
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Figure 4.3 64-bit Virtual Address Translation 

Operating Modes 

The processor has three operating modes that function in both 32- and 
64-bit operations: 

• User mode 

• Supervisor mode 

• Kernel mode 

These modes are described in the next three sections. 

User Mode Operations 

In User mode, a single, uniform virtual address space — labelled User 
segment — is available; its size is: 

• 2 Gbytes (2 31 bytes) for Status.UX = {useg) 

• 1 Tbyte (2 40 bytes) for Status.UX = 1 [xuseg) 
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Figure 4.4 shows the User mode virtual address space. 



ok ffff 

Ox 8000 

OK 0000 
Note: 




32-bit* 




64-bit 


xuseg 
32. 


FBFF 
0000 

0000 


Address 
Error 


Ox FFFF FFFF FFFF FFFF 

Ox 0000 0100 0000 0000 

us eg 

Ox 0000 0000 0000 0000 


Address 
Error 


2 GB 
Mapped 


1TB 
Mapped 


*For 32-bit virtual addresses, bit 31 is sign-extended through bits 63 
Failure (i.e., bit 31 = 1) results in an Address Error exception. 



Figure 4.4 User Mode Virtual Address Space 



The User segment starts at address O and the current active user 
process resides in either useg (32-bit virtual addressing) or xuseg (in 64- 
bit virtual addressing). The TLB identically maps all references to useg/ 
xuseg from all modes, and controls cache accessibility. 

The processor operates in User mode when the Status register contains 
the following bit-values: 

• KSU bits = 10 2 

• EXL = 

• ERL = 

In conjunction with these bits, the UX bit in the Status register selects 
between 32- or 64-bit User virtual addressing as follows: 

• when UX = O, 32-bit useg space is selected 

• when UX = 1, 64-bit xuseg space is selected 

Table 4. 1 lists the characteristics of the two user mode segments, useg 
and xuseg. 



Address Bit 
Values 


Status Register 
Bit Values 


Segment 
Name 


Address Range 


Segment Size 


KSU 


EXL 


ERL 


UX 


32-bit 
A(31) = 


io 2 











useg 


OxOOOO 0000 
through 
0x7FFF FFFF 


2 Gbyte 
(2 31 bytes) 


64-bit 
A(63:40) = 


10 2 








1 


xuseg 


OxOOOO 0000 0000 0000 

through 

OxOOOO OOFF FFFF FFFF 


1 Tbyte 
(2 40 bytes) 



Table 4.1 32-bit and 64-bit User Mode Segments 

32-bit User Mode (useg) 

In User mode, when Status.UX = 0, User mode virtual addressing is 
compatible with the 32-bit addressing model shown in Figure 4.4, and a 2- 
Gbyte user address space is available, labelled useg. 



4-5 



Memory Management 



Chapter 4 



All valid User mode virtual addresses have their most-significant bit 
cleared to 0; any attempt to reference an address with the most-significant 
bit set while in User mode causes an Address Error exception. 

In 32-bit User mode virtual addressing, the TLB refill exception vector is 
used for TLB misses. 

The system maps all references to useg through the TLB, and bit 
settings within the TLB entry for the page determine the cacheability of a 
reference. 

64-bit User Mode (xnseg) 

In User mode, when Status.UX =1, User mode virtual addressing is 
extended to the 64-bit model shown in Figure 4.4, and a 1-Tbyte user 
address space is available, labelled xuseg. 

All valid User mode virtual addresses have bits 63:40 equal to 0; an 
attempt to reference an address with bits 63:40 not equal to causes an 
Address Error exception. 

The extended addressing TLB refill exception vector is used for TLB 
misses. 

Supervisor Mode Operations 

Supervisor mode is designed for layered operating systems in which a 
true kernel runs in R4600/R4700 Kernel mode, and the rest of the 
operating system runs in Supervisor mode. 

The processor operates in Supervisor mode when the Status register 
contains the following bit-values: 

• KSU= 01 2 

• EXL = 

• ERL=0 

In conjunction with these bits, the SXbit in the Status register selects 
between 32- or 64-bit Supervisor mode virtual addressing: 

• when SX= 0, 32-bit supervisor space virtual addressing is selected 

• when SX= 1, 64-bit supervisor space virtual addressing is selected 
Figure 4.5 shows Supervisor mode address mapping. Table 4.2, which 

follows the figure, lists the characteristics of the supervisor mode 
segments; descriptions of the address spaces follow. 



32-bit* 



64-bit 



QxFFFFFFFF 
OcEDOOOOOO 

QxCDOOOOOO 

QxAXX)0000 

0x8000 0000 



Qx 0000 0000 



Address 
error 



0.5 GB 
Mapped 



Address 
error 



Address 
error 



2 GB 
Mapped 



sseg 



suseg 



Cbc FFFF FFFF FFFF FFFF 
Ox FFFF FFFF EDOO 0000 

Qx FFFF FFFF O000 0000 

Qx 4000 0100 0000 0000 

0x4000 0000 0000 0000 

Qx 0000 0100 0000 0000 

Qx 0000 0000 0000 0000 



Address 
error 



0.5 GB 
Mapped 



Address 
error 



1TB 
Mapped 



Address 
error 



1TB 
Mapped 



csseg 



xsuseg 



Note: *In 32-bit virtual addressing, bit 3 1 is sign-extended through bits 
63:32. Failure results in an Address Error exception. 



Figure 4.5 Supervisor Mode Virtual Address Space 
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Address Bit 
Values 


Status Register 
Bit Values 


Segment 
Name 


Address Range 


Segment 
Size 


KSU 


EXL 


ERL 


sx 


32-bit 
A(31) = 


oi 2 











suseg 


0x0000 0000 
through 
0x7FFF FFFF 


2 Gbytes 
(2 31 bytes) 


32-bit 
A(31:29) = 110 2 


01 2 











sseg 


OxCOOO 0000 
through 
OxDFFF FFFF 


512 Mbytes 
(2 29 bytes) 


64-bit 
A(63:62) = 00 2 


01 2 








1 


xsuseg 


0x0000 0000 0000 0000 

through 

0x0000 OOFF FFFF FFFF 


1 Tbyte 
(2 40 bytes) 


64-bit 
A(63:62) = 01 2 


oi 2 








1 


xsseg 


0x4000 0000 0000 0000 

through 

0x4000 OOFF FFFF FFFF 


1 Tbyte 
(2 40 bytes) 


64-bit 
A(63:62) = 11 2 


01 2 








1 


csseg 


OxFFFF FFFF C000 0000 

through 

OxFFFF FFFF DFFF FFFF 


512 Mbytes 
(2 29 bytes) 



Table 4.2 32-bit and 64-bit Supervisor Mode Segments 

32-bit Supervisor Mode, User Space {suseg) 

In Supervisor mode, when Status. SX = and the most-significant bit of 
the 32-bit virtual address is set to 0, the suseg virtual address space is 
selected; it covers the full 2 31 bytes (2Gbytes) of the current user address 
space. The virtual address is extended with the contents of the 8-bit ASID 
field to form a unique virtual address. 

This mapped space starts at virtual address 0x0000 0000 and runs 
through 0x7FFF FFFF. 

32-bit Supervisor Mode, Supervisor Space {sseg) 

In Supervisor mode, when Status.SX = and the three most-significant 
bits of the 32-bit virtual address are 1 10 2 , the sseg virtual address space 
is selected; it covers 2 29 -bytes (512Mbytes) of the current supervisor 
address space. The virtual address is extended with the contents of the 8- 
bit ASID field to form a unique virtual address. 

This mapped space begins at virtual address OxCOOO 0000 and runs 
through OxDFFF FFFF. 

64-bit Supervisor Mode, User Space {xsuseg) 

In Supervisor mode, when Status.SX = 1 and bits 63:62 of the virtual 
address are set to 00 2 , the xsuseg virtual address space is selected; it 
covers the full 2 40 bytes (1Tbyte) of the current user address space. The 
virtual address is extended with the contents of the 8-bit ASID field to form 
a unique virtual address. 

This mapped space starts at virtual address 0x0000 0000 0000 0000 
and runs through 0x0000 OOFF FFFF FFFF. 

64-bit Supervisor Mode, Current Supervisor Space {xsseg) 

In Supervisor mode, when Status.SX = 1 and bits 63:62 of the virtual 
address are set to 01 2 , the xsseg current supervisor virtual address space 
is selected. The virtual address is extended with the contents of the 8-bit 
ASID field to form a unique virtual address. 

This mapped space begins at virtual address 0x4000 0000 0000 0000 
and runs through 0x4000 OOFF FFFF FFFF. 
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64-bit Supervisor Mode, Separate Supervisor Space (csseg) 

In Supervisor mode, when Status. SX = 1 and bits 63:62 of the virtual 
address are set to 1 1 2 , the csseg separate supervisor virtual address space 
is selected. Addressing of the csseg is compatible with addressing sseg in 
32-bit mode. The virtual address is extended with the contents of the 8- 
bit ASID field to form a unique virtual address. 

This mapped space begins at virtual address OxFFFF FFFF C000 0000 
and runs through OxFFFF FFFF DFFF FFFF. 

Kernel Mode Operations 

The processor operates in Kernel mode when the Status register 
contains one of the following values: 

• KSU= 00 2 

• EXL= 1 

• ERL= 1 

In conjunction with these bits, the KXbit in the Status register selects 
between 32- or 64-bit Kernel mode addressing: 

• when KX = 0, 32-bit kernel space virtual addressing is selected 

• when KX= 1, 64-bit kernel space virtual addressing is selected 

The processor enters Kernel mode whenever an exception is detected 
and it remains in Kernel mode until an Exception Return (ERET) 
instruction is executed. The ERET instruction restores the processor to 
the mode existing prior to the exception. 

Kernel mode virtual address space is divided into regions differentiated 
by the high-order bits of the virtual address, as shown in Figure 4.6. 
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Figure 4.6 Kernel Mode Address Space 
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Table 4.3 lists the characteristics of the 32-bit kernel mode segments, 
and Table 4.4 lists the characteristics of the 64-bit kernel mode segments 



Address Bit 
Values 


Status Register Is 
One Of These Values 


Segment 
Name 


Address Range 


Segment 
Size 


KSU 


EXL 


ERL 


KX 


A(31) = 


KSU = OO2 

or 

EXL= 1 

or 
ERL=1 





kuseg 


0x0000 0000 
through 
0x7FFF FFFF 


2 Gbytes 
(2 31 bytes) 


A(31:29) = IOO2 





ksegO 


0x8000 0000 
through 
0x9FFF FFFF 


512 

Mbytes 
(2 25 bytes) 


A(31:29) = 101 2 





ksegl 


OxAOOO 0000 
through 
OxBFFF FFFF 


512 
Mbytes 
(2 2 * bytes) 


A(31:29) = 110 2 





ksseg 


OxCOOO 0000 
through 
OxDFFF FFFF 


512 
Mbytes 
(2 2d bytes) 


A(31:29) = 111 2 











kseg3 


OxEOOO 0000 
through 
OxFFFF FFFF 


512 
Mbytes 
(2 2 * bytes) 



Table 4.3 32-bit Kernel Mode Segments 

32-bit Kernel Mode, User Space (kuseg) 

In Kernel mode, when Status.KX = 0, and the most-significant bit of the 
virtual address, A31, is cleared, the 32-bit kuseg virtual address space is 
selected; it covers the full 2 31 bytes (2 Gbytes) of the current user address 
space. The virtual address is extended with the contents of the 8-bit ASID 
field to form a unique virtual address, 

32-bit Kernel Mode, Kernel Space (ksegO) 

In Kernel mode, when Status.KX = and the most-significant three bits 
of the virtual address are 100 2 , 32-bit ksegO virtual address space is 
selected; it is the current 2 29 -byte (512-Mbyte) kernel physical space. 

References to ksegO are not mapped through the TLB; the physical 
address selected is defined by subtracting 0x8000 0000 from the virtual 
address (physical address = 0x0000 0000 I I VA[28:0]). 

The KO field of the Corifig register, described in this chapter, controls 
cacheability and coherency. 

32-bit Kernel Mode, Kernel Space 1 (ksegl) 

In Kernel mode, when Status.KX = and the most-significant three bits 
of the 32-bit virtual address are 10 12, 32-bit ksegl virtual address space 
is selected; it is the current 2 29 -byte (512Mbyte) kernel physical space. 

References to ksegl are not mapped through the TLB; the physical 
address selected is defined by subtracting OxAOOO 0000 from the virtual 
address (physical address = 0x0000 0000 I I VA[28:0]). 

Caches are disabled for accesses to these addresses, and physical 
memory (or memory-mapped I/O device registers) are accessed directly. 

32-bit Kernel Mode, Supervisor Space (ksseg) 

In Kernel mode, when Status.KX = and the most-significant three bits 
of the 32 -bit virtual address are 1 10 2 , the ksseg virtual address space is 
selected; it is the current 2 29 -byte (512Mbyte) supervisor virtual space. 
The virtual address is extended with the contents of the 8-bit ASID field to 
form a unique virtual address. 



4-10 



Memory Management 



Chapter 4 



32-bit Kernel Mode, Kernel Space 3 (ksegS) 

In Kernel mode, when Status.KX = and the most-significant three bits 
of the 32-bit virtual address are 1 1 1 2 , the kseg3 virtual address space is 
selected; it is the current 2 29 -byte (512Mbyte) kernel virtual space. The 
virtual address is extended with the contents of the 8-bit ASID field to form 
a unique virtual address. 



Address Bit 
Values 


Status Register Is 
One Of These Values 


Segment 
Name 


Address Range 


Segment 
Size 


KSU 


EXL 


ERL 


KX 


A(63:62) = 00 2 


KSU = 00 2 

or 

EXL= 1 

or 
ERL=1 




xkuseg 


0x0000 0000 0000 0000 

through 

0x0000 OOFF FFFF FFFF 


1 Tbyte 
(2 40 bytes) 


A(63:62) = 01 2 




xksseg 


0x4000 0000 0000 0000 

through 

0x4000 OOFF FFFF FFFF 


1 Tbyte 
(2 40 bytes) 


A(63:62) = 10 2 




xkphys 


0x8000 0000 0000 0000 

through 

OxBFFF FFFF FFFF FFFF 


8 2 36 -byte 
spaces 


A(63:62) = 1 1 2 




xkseg 


OxCOOO 0000 0000 0000 

through 

OxCOOO OOFF 7FFF FFFF 


2 44 bytes 


A(63:62) = 1 1 2 
A(61:31) = -l 




cksegO 


OxFFFF FFFF 8000 0000 

through 

OxFFFF FFFF 9FFF FFFF 


512 
Mbytes 
(2 29 bytes) 


A(63:62)= 11 2 
A(61:31) = -l 




cksegl 


OxFFFF FFFF A000 0000 

through 

OxFFFF FFFF BFFF FFFF 


512 
Mbytes 
(2 25 bytes) 


A(63:62) = 1 1 2 
A(61:31) = -l 




cksseg 


OxFFFF FFFF C000 0000 

through 

OxFFFF FFFF DFFF FFFF 


512 
Mbytes 
(2 29 bytes) 


A(63:62)= 11 2 
A(61:31) = -l 










ckseg3 


OxFFFF FFFF E000 0000 

through 

OxFFFF FFFF FFFF FFFF 


512 
Mbytes 
(2 29 bytes) 



Table 4.4 64-bit Kernel Mode Segments 

64-bit Kernel Mode, User Space (xkuseg) 

In Kernel mode, when Status.KX = 1 and bits 63:62 of the 64-bit virtual 
address are 00 2 , the xkuseg virtual address space is selected; it covers the 
current user address space. The virtual address is extended with the 
contents of the 8-bit ASID field to form a unique virtual address. 

As a special feature for the ECC handler, if the ERL bit of the Status 
register is set, the user address region becomes a 2 31 -byte unmapped, 
uncached space. This allows the ECC exception code to operate uncached 
using rO as a base register. 

64-bit Kernel Mode, Current Supervisor Space {xksseg) 

In Kernel mode, when Status.KX = 1 and bits 63:62 of the 64-bit virtual 
address are 01 2 , the xksseg virtual address space is selected; it is the 
current supervisor virtual space. The virtual address is extended with the 
contents of the 8-bit ASID field to form a unique virtual address. 
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64-bit Kernel Mode, Physical Spaces (xkphys) 

In Kernel mode, when Status.KX = 1 and bits 63:62 of the 64-bit virtual 
address are 10 2 , the xkphys virtual address space is selected; it is a set of 
eight 2 36 -byte kernel physical spaces. Accesses with address bits 58:36 
not equal to cause an address error. 

References to this space are not mapped; the physical address selected 
is taken from bits 35:0 of the virtual address. Bits 61:59 of the virtual 
address specify the cacheability and coherency attributes, as shown in 
Table 4.5. 



Value 
(61:59) 


Cacheability and Coherency Attributes 


Starting Address 





Cacheable, noncoherent, write-through, no 
write allocate 


0x8000 0000 0000 0000 


1 


Cacheable, noncoherent, write-through, write 
allocate 


0x8800 0000 0000 0000 


2 


Uncached 


0x9000 0000 0000 0000 


3 


Cacheable, noncoherent 


0x9800 0000 0000 0000 


4-7 


Reserved 


OxAOOO 0000 0000 0000 



Table 4.5 Cacheability and Coherency Attributes 

64-bit Kernel Mode, Kernel Space (xkseg) 

In Kernel mode, when Status.KX = 1 and bits 63:62 of the 64-bit virtual 
address are 1 1 2 , the address space selected is one of the following: 

• kernel virtual space, xkseg, the current supervisor virtual space; the 
virtual address is extended with the contents of the 8-bit ASID field to 
form a unique virtual address 

• one of the four 32 -bit kernel compatibility spaces, as described in the 
next section. 

64-bit Kernel Mode, Compatibility Spaces (cksegliO, cksseg, ckseg3) 

In Kernel mode, when Status.KX = 1, bits 63:62 of the 64-bit virtual 
address are 1 1 2 > and bits 61:31 of the virtual address equal "-1", the lower 
two bytes of address, as shown in Figure 4.6, select one of the following 
512-Mbyte compatibility spaces. 

• cksegO. This 64-bit virtual address space is an unmapped region, 
compatible with the 32-bit address model ksegO. The KO field of the 
Config register, described in this chapter, controls cacheability and 
coherency. 

• cksegl. This 64-bit virtual address space is an unmapped and un- 
cached region, compatible with the 32-bit address model ksegl. 

• cksseg. This 64-bit virtual address space is the current supervisor 
virtual space, compatible with the 32-bit address model ksseg. 

• ckseg3. This 64-bit virtual address space is kernel virtual space, 
compatible with the 32-bit address model kseg3. 

System Control Coprocessor 

The System Control Coprocessor (CPO) is implemented as an integral 
part of the CPU, and supports memory management, address translation, 
exception handling, and other privileged operations. CPO contains the 
registers shown in Figure 4.7 plus a 48-entry TLB. The sections that follow 
describe how the processor uses each of the memory management-related 
registers. 

Each CPO register has a unique number that identifies it; this number 
is referred to as the register number. For instance, the Page Mask register 
is register number 5. 
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(See Random Register, 
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127 







□ 



Used with memory 
management system. 
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Used with exception 
processing. See 
Chapter 5 for details. 



Note: *Register number 



Figure 4.7 CPO Registers and the TLB 

Format of a TLB Entry 

Figure 4.8 shows the TLB entry formats for both 32- and 64-bit virtual 
addressing. Each field of an entry has a corresponding field in the EntryHi, 
EntryLoO, EntryLol, or PageMask registers, as shown in Figure 4.9 and 
Figure 4. 10; for example the Mask field of the TLB entry is also held in the 
PageMask register. 
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Figure 4.8 Format of a TLB Entry 

The format of the Entry Hi Entry LoO, EntryLol, and PageMask registers 
are nearly the same as the TLB entry. The one exception is the Global field 
(G bit), which is used in the TLB, but is reserved in the EntryHi register. 
Figure 4.9 and Figure 4. 10 describe the TLB entry fields that are shown in 
Figure 4.8. 
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MASK 
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Mask.... Page comparison mask. 

Reserved. Must be written as zeroes, and returns zeroes when read. 



63 62 61 



64-bit 
VA 



R 



EntryHi Register 

40 39 13 12 



8 7 



FILL 



VPN2 



ASID 



22 



27 
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VPN2.... Virtual page number divided by two (maps to two pages). 

ASID Address space ID field. An 8-bit field that lets multiple processes share the TLB; each 

process has a distinct mapping of otherwise identical virtual page numbers. 

R Region. (00 -* user, 01 -> supervisor, 11 -> kernel) used to match vAddr 63 62 

Fill Reserved. Returns zero when read, ignored on writes. 

Reserved. Must be written as zeroes, and returns zeroes when read. 



Figure 4.9 Fields of the PageMask and EntryHi Registers 
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EntryLoO and EntryLol Registers 
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PFN Page frame number; the upper bits of the physical address. 

C Specifies the TLB page coherency attribute; see Table 4.6. 

D Dirty. If this bit is set, the page is marked as dirty and, therefore, writable. This bit is 

actually a write-protect bit that software can use to prevent alteration of data. 
V. Valid. If this bit is set, it indicates that the TLB entry is valid; otherwise, a TLBL or TLBS 

miss occurs. 
G Global. If this bit is set in both LoO and Lo1, then the processor ignores the ASID during 

TLB lookup. 
Reserved. Must be written as zeroes, and returns zeroes when read. 



Figure 4.10 Fields of the EntryLoO and EntryLol Registers 

The TLB page coherency attribute (Q bits specify whether references to 
the page should be cached; if cached, the algorithm selects between several 
coherency attributes. Table 4.6 shows the coherency attributes selected 
by the C bits. 



aS:3) Value 


Page Coherency Attribute 





Cacheable, noncoherent, write-through, no write allocate 


1 


Cacheable, noncoherent, write- through, write allocate 


2 


Uncached 


3 


Cacheable, noncoherent, write-back 


4-7 


Reserved 



Table 4.6 TLB Page Coherency (C) Bit Values 

CPO Registers 

The following sections describe the CPO registers (shown in Figure 4.7 
on page 13) that are assigned specifically as a software interface with 
memory management (each register is followed by its register number in 
parentheses). 

• Index register (CPO register number 0) 

• Random register (1) 

• EntryLoO (2) and EntryLol (3) registers 

• PageMask register (5) 

• Wired register (6) 

• EntryHi register (10) 

• PRId register (15) 

• Config register (16) 

• LLAddr register (17) 

• TagLo (28) and TagHi (29) registers 
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Index Register (0) 

The Index register is a 32-bit, read/write register containing six bits to 
index an entry in the TLB. The high-order bit of the register shows the 
success or failure of a TLB Probe (TLBP) instruction. 

The Index register also specifies the TLB entry affected by TLB Read 
(TLBR) or TLB Write Index (TLBWI) instructions. 

Figure 4.11 shows the format of the Index register; Table 4.7, which 
follows the figure, describes the Index register fields. 
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Figure 4. 1 1 Index Register 



Field 


Description 


P 


Probe failure. Set to 1 when the previous TLBProbe 
(TLBP) instruction was unsuccessful. 


Index 


Index to the TLB entry affected by the TLBRead and 
TLBWrite instructions 





Reserved. Must be written as zeroes, and returns 
zeroes when read. 



Table 4.7 Index Register Field Descriptions 

Random Register (1) 

The Random register is a read-only register of which six bits index an 
entry in the TLB. This register decrements as each instruction executes, 
and its values range between an upper and a lower bound, as follows: 

• A lower bound is set by the number of TLB entries reserved for exclu- 
sive use by the operating system (the contents of the Wired register). 

• An upper bound is set by the total number of TLB entries. Thus the 
upper bound is 47 (The TLB entries are number from to 47). 

The R4600/R4700 implements this register differently from the 
R4000: The R4000 counts both valid and invalid instructions, while the 
R4600/R4700 counts only valid instructions. 

The Random register specifies the entry in the TLB that is affected by the 
TLB Write Random instruction. The register does not need to be read for 
this purpose; however, the register is readable to verify proper operation of 
the processor. 

To simplify testing, the Random register is set to the value of the upper 
bound upon system reset. This register is also set to the upper bound 
when the Wired register is written. 

Figure 4. 12 shows the format of the Random register; Table 4.8 on 
page 17 describes the Random register fields. 
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Figure 4.12 Random Register 
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Field 


Description 


Random 


TLB random index 





Reserved. Must be written as zeroes, and returns zeroes when read. 



Table 4.8 Random Register Field Descriptions 

EntryLoO (2), and EntryLol (3) Registers 

The EntryLo register consists of two registers that have identical 
formats: 

• EntryLoO is used for even virtual pages. 

• EntryLol is used for odd virtual pages. 

The EntryLoO and EntryLol registers are read/write registers. They 
hold the physical page frame number (PFN) of the TLB entry for even and 
odd pages, respectively, when performing TLB read and write operations. 
Figure 4.10 on page 15 shows the format of these registers. 

PageMask Register (5) 

The PageMask register is a read/write register used for reading from or 
writing to the TLB; it holds a comparison mask that sets the variable page 
size for each TLB entry, as shown in Table 4.9. 

TLB read and write operations use this register as either a source or a 
destination; when virtual addresses are presented for translation into 
physical address, the corresponding bits in the TLB identify which virtual 
address bits among bits 24: 13 are used in the comparison. 

When the Mask field is not one of the values shown in Table 4.9, the 
operation of the TLB is undefined. 
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Table 4.9 Mask Field Values for Page Sizes 
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Wired Register (6) 

The Wired register is a read/write register that specifies the boundary 
between the wired and random entries of the TLB, as shown in Figure 4. 13. 
Wired entries are nonreplaceable entries, which cannot be overwritten by 
a TLB write random operation. Random entries can be overwritten. 



TLB 



~1 
Range of Wired entries 

_JL_ 



47 



Range of Random entries 



.Wired 
Register 



Figure 4. 13 Wired Register Boundary 

The Wired register is set to upon system reset. Writing this register 
also sets the Random register to the value of its upper bound (see Random 
register, above). Figure 4. 14 shows the format of the Wired register; 
Table 4.10, which follows the figure, describes the register fields. 
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Figure 4.14 Wired Register 



Field 


Description 


Wired 


TLB Wired boundary (the number of wired TLB entries) 





Reserved. Must be written as zeroes, and returns zeroes 
when read. 



Table 4.10 Wired Register Field Descriptions 

EntryHi Register (CPO Register 10) 

The EntryHi register holds the high-order bits of a TLB entry for TLB 
read and write operations. 

The EntryHi register is accessed by the TLB Probe, TLB Write Random, 
TLB Write Indexed, and TLB Read Indexed instructions. 

Figure 4.9 shows the format of this register. 

When either a TLB refill, TLB invalid, or TLB modified exception occurs, 
the EntryHi register is loaded with the virtual page number (VPN2) and the 
ASID of the virtual address that did not have a matching TLB entry. (See 
Chapter 5 for more information about these exceptions.) 
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Processor Revision Identifier (PRId) Register (15) 

The 32-bit, read-only Processor Revision Identifier (PRId) register 
contains information identifying the implementation and revision level of 
the CPU and CPO. Figure 4.15 shows the format of the PRId register; 
Table 4.11 describes the PRId register fields. 
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Figure 4.15 Processor Revision Identifier Register Format 



Field 


Description 


Imp 


Implementation number R4600: Imp = 0x20 

R4700: Imp = 0x21 


Rev 


Revision number 





Reserved. Must be written as zeroes, and returns zeroes when read. 



Table 4.11 PRId Register Fields 

The low-order byte (bits 7:0) of the PRId register is interpreted as a 
revision number, and the high-order byte (bits 15:8) is interpreted as an 
implementation number. The implementation number of the R4600/ 
R4700 processor is 0x20. The content of the high-order haliword (bits 
31:16) of the register are reserved. 

The revision number is stored as a value in the form y.x, where y is a 
major revision number in bits 7:4 and x is a minor revision number in bits 
3:0. 

The revision number can distinguish some chip revisions, however there 
is no guarantee that changes to the chip will necessarily be reflected in the 
PRId register, or that changes to the revision number necessarily reflect 
real chip changes. For this reason, these values are not listed and software 
should not rely on the revision number in the PRId register to characterize 
the chip. Certain attributes, such as cache size, are independent of 
implementation number. 

Config Register (16) 

The Config register specifies various configuration options selected on 
R4600/R4700 processors; Table 4. 12 lists these options. 

Some configuration options, as defined by Corifig bits 31:3, are set by 
the hardware during reset and are included in the Config register as read- 
only status bits for the software to access. The KO field is the only read/ 
write field (as indicated by Config register bits 2:0) and controlled by 
software; on reset these fields are undefined. 

Figure 4.16 shows the format of the Config register; Table 4.12, which 
follows the figure, describes the Config register fields. 
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Figure 4.16 Config Register Format 
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Field 


Description 


EC 


System clock ratio: 

-> processor clock frequency divided by 2 

1 — » processor clock frequency divided by 3 

2 -» processor clock frequency divided by 4 

3 -> processor clock frequency divided by 5 

4 -> processor clock frequency divided by 6 

5 -> processor clock frequency divided by 7 

6 -» processor clock frequency divided by 8 

7 Reserved 


EP 


Writeback data rate: 

-» DDDD Doubleword every cycle 

1 -» DDxDDx 2 Doublewords every 3 cycles 

2 -» DDxxDDxx 2 Doublewords every 4 cycles 

3 -> DxDxDxDx 2 Doublewords every 4 cycles 

4 — » DDxxxDDxxx 2 Doublewords every 5 cycles 

5 -» DDxxxxDDxxxx 2 Doublewords every 6 cycles 

6 — » DxxDxxDxxDxx 2 Doublewords every 6 cycles 

7 -> DDxxxxxDDxxxxx 2 Doublewords every 7 cycles 

8 —> DxxxDxxxDxxxDxxx 2 Doublewords every 8 cycles 
9-15 Reserved 


BE 


BigEndianMem 

-» Little endian 

1 -» Big endian 


IC 


Primary I-cache Size (I-cache size = 2 12+IC bytes). In the R4600/R4700 
processor, this is set to 16 Kbytes (IC = 010) 


DC 


Primary D-cache Size (D-cache size = 2 12+DC bytes). In the R4600/R4700 
processor, this is set to 16 Kbytes (DC = 010) 


IB 


Primary I-cache line size 
l-> 32 bytes (8 Words) 


DB 


Primary D-cache line size 
l-» 32 bytes (8 Words) 


KO 


ksegO coherency algorithm (see EntryLoO and EntryLol registers) 


Others 


Reserved. Returns indicated values when read. 



Table 4.12 Config Register Fields 

Load Linked Address (LLAddr) Register (17) 

The read/write Load Linked Address (LLAddr) register contains the 
physical address read by the most recent Load Linked instruction. 

This register is for diagnostic purposes only, and serves no function 
during normal operation. 

Figure 4. 17 shows the format of the LLAddr register; PAddr represents 
bits of the physical address, PA(35:4). 
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Figure 4.17 LLAddr Register Format 

Cache Tag Registers [TagLo (28) and TagHi (29)] 

The TagLo and TagHi registers are 32-bit read/write registers that hold 
the primary cache tag and parity during cache initialization, cache 
diagnostics, or cache error processing. The Tag registers are written by the 
CACHE and MTCO instructions. 

The P field of these registers is ignored on Index Store Tag operations. 
Parity is computed by the store operation. 

The Windows NT Operating System uses the TagLo cpO register to save/ 
restore gp registers in the TLB refill exception handler. Thus, all 32 bits 
must be present, even though they have no use for the primary purpose of 
TagLo. 

Figure 4.18 shows the format of these registers for primary cache 
operations. Table 4.13 lists the field definitions of the TagLo and TagHi 
registers. 



TagLo 
TagHi 


31 




8 7 6 5 3 2 10 




PTagLo 




PState 


RWNT 


F 





' 


31 


24 


2 3 111 








i 






32 



Figure 4.18 TagLo and TagHi Register (P-cache) Formats 



Field 


Description 


PTagLo 


Specifies the physical address bits 35:12 


PState 


Specifies the primary cache state 


P 


Specifies the primary tag even parity bit 


F 


The FIFO bit used to implement FIFO refill of the cache 


RWNT 


Read /Write bits required for Windows NT 





Reserved. Must be written as zeroes; returns zeroes when read 



Table 4.13 Cache Tag Register Fields 
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Virtual-to-Physical Address Translation Process 

During virtual-to-physical address translation, the CPU compares the 
8-bit ASID (if the Global bit, G, is not set) of the virtual address to the ASID 
of the TLB entry to see if there is a match. 

The following comparison is also made: 

• For the 64-bit virtual addresses, the highest 15-to-27 bits (depending 
upon the page size) of the virtual address are compared to the con- 
tents of the TLB virtual page number. 

If a TLB entry matches, the physical address and access control bits (C, 
D, and V) are retrieved from the matching TLB entry. While the Vbit of the 
entry must be set for a valid translation to take place, it is not involved in 
the determination of a matching TLB entry. 

Figure 4.19 illustrates the TLB address translation process. 
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Note: For valid address space 
see the section in this chapter 
that describes Operating Modes. 




Exception 
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Figure 4.19 TLB Address Translation 
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TLB Misses 

If there is no TLB entry that matches the virtual address, a TLB miss 
exception occurs. If the access control bits (D and V) indicate that the 
access is not valid, a TLB modification or TLB invalid exception occurs. If 
the C bits equal 010 2 , the physical address that is retrieved accesses main 
memory, bypassing the cache. 

TLB Instructions 

Table 4. 14 lists the instructions that the CPU provides for working with 
the TLB. See Appendix A for a detailed description of these instructions. 



Op Code 


Description of Instruction 


TLBP 


Translation Lookaside Buffer Probe 


TLBR 


Translation Lookaside Buffer Read 


TLBWI 


Translation Lookaside Buffer Write Index 


TLBWR 


Translation Lookaside Buffer Write Random 



Table 4.14 TLB Instructions 
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Processing 



This chapter describes the CPU exception processing, including an 
explanation of exception processing, followed by the format and use of 
each CPU exception register. 

The chapter concludes with a description of each exception's cause, 
together with the manner in which the CPU processes and services these 
exceptions. For information about Floating-Point Unit exceptions, see 
Chapter 7. 

How Exception Processing Works 

The processor receives exceptions from a number of sources, including 
translation lookaside buffer (TLB) misses, arithmetic overflows, I/O 
interrupts, and system calls. When the CPU detects one of these 
exceptions, the normal sequence of instruction execution is suspended 
and the processor enters Kernel mode (see Chapter 4 for a description of 
system operating modes). 

The processor then disables interrupts and forces execution of a 
software exception processor (called a handler) located at a fixed address. 
The handler may save the context of the processor, including the contents 
of the program counter, the current operating mode (User or Supervisor), 
and the status of the interrupts (enabled or disabled). This context would 
be saved so it can be restored when the exception has been serviced. 

When an exception occurs, the CPU loads the Exception Program 
Counter (EPCj register with a location where execution can restart after the 
exception has been serviced. The restart location in the EPC register is the 
address of the instruction that caused the exception or, if the instruction 
was executing in a branch delay slot, the address of the branch instruction 
immediately preceding the delay slot. 

The registers described later in the chapter assist in this exception 
processing by retaining address, cause and status information. 

For a description of the exception handling process, see the description 
of the individual exception contained in this chapter, or the flowcharts at 
the end of this chapter. 

Exception Processing Registers 

This section describes the CPO registers that are used in exception 
processing. Table 5. 1 on page 5-2 lists these registers, along with their 
number — each register has a unique identification number that is referred 
to as its register number. For instance, the ECC register is register number 
26. The remaining CPO registers are used in memory management, as 
described in Chapter 4. 

Software examines the CPO registers during exception processing to 
determine the cause of the exception and the state of the CPU at the time 
the exception occurred. The registers in Table 5. 1 are used in exception 
processing, and are described in the sections that follow. 
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Register Name 


Reg. No. 


Context 


4 


BadVAddr (Bad Virtual Address) 


8 


Count 


9 


Compare register 


11 


Status 


12 


Cause 


13 


EPC (Exception Program Counter) 


14 


XContext 


20 


ECC 


26 


CacheErr (Cache Error and Status) 


27 


ErrorEPC (Error Exception Program Counter) 


30 



Table 5.1 CPO Exception Processing Registers 

Context Register (4) 

The Context register is a read/write register containing the pointer to an 
entry in the page table entry (PTE) array; this array is an operating system 
data structure that stores virtual-to-physical address translations. When 
there is a TLB miss, the CPU loads the TLB with the missing translation 
from the PTE array. Normally, the operating system uses the Context 
register to address the current page map which resides in the kernel- 
mapped segment, kseg3. The Context register duplicates some of the 
information provided in the BadVAddr register, but the information is 
arranged in a form that is more useful for a software TLB exception 
handler. Figure 5.1 shows the format of the Context register; Table 5.2, 
which follows the figure, describes the Context register fields. 
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Figure 5. 1 Context B 


agister Format 







Field 


Description 


BadVPN2 


This field is written by hardware on a miss. It contains 
the virtual page number (VPN) of the most recent virtual 
address that did not have a valid translation. 


PTEBase 


This field is a read/write field for use by the operating 
system. It is normally written with a value that allows 
the operating system to use the Context register as a 
pointer into the current PTE array in memory. 



Table 5.2 Context Register Fields 

The 19-bit BadVPN2 field containsrbits 31:13 of the virtual address that 
caused the TLB miss; bit 12 is excluded because a single TLB entry maps 
to an even-odd page pair. For a 4-Kbyte page size, this format can directly 
address the pair-table of 8-byte PTEs. For other page and PTE sizes, 
shifting and masking this value produces the appropriate address. 
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Bad Virtual Address Register (BadVAddr) (8) 

The Bad Virtual Address register {BadVAddr) is a read-only register that 
displays the most recent virtual address that caused one of the following 
exceptions: Address Error (e.g., unaligned access), TLB Invalid, TLB 
Modified, TLB Refill, Virtual Coherency Data Access, or Virtual Coherency 
Instruction Fetch. 

The processor does not write to the BadVAddr register when the EXL bit 
in the Status register is set to a 1. 

Figure 5.2 shows the format of the BadVAddr register. 
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Figure 5.2 BadVAddr Register Format 

Note: The BadVAddr register does not save any information for bus 
errors, since bus errors are not addressing errors. 

Count Register (9) 

The Count register acts as a timer, incrementing at a constant rate — half 
the maximum instruction issue rate— whether or not an instruction is 
executed, retired, or any forward progress is made through the pipeline. 

This register can be read or written. It can be written for diagnostic 
purposes or system initialization; for example, to synchronize processors. 

Figure 5.3 shows the format of the Count register. 
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Figure 5.3 Count Register Format 





Compare Register (1 1) 

The Compare register acts as a timer (see also the Count register); it 
maintains a stable value that does not change on its own. 

When the value of the Count register equals the value of the Compare 
register, interrupt bit IP(7) in the Cause register is set. This causes an 
interrupt as soon as the interrupt is enabled. 

Writing a value to the Compare register, as a side effect, clears the timer 
interrupt. 

For diagnostic purposes, the Compare register is a read/write register. 
In normal use however, the Compare register is write-only. Figure 5.4 
shows the format of the Compare register. 
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Figure 5.4 Compare Register Format 
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Status Register (12) 

The Status register (SR) is a read/write register that contains the 
operating mode, interrupt enabling, and the diagnostic states of the 
processor. The following list describes the more important Status register 
fields; Figure 5.5 show the format of the entire register, including 
descriptions of the fields. Some of the important fields include: 

• The 8-bit Interrupt Mask (IM) field controls the enabling of eight inter- 
rupt conditions. Interrupts must be enabled before they can cause the 
exception, and the corresponding bits are set in both the Interrupt 
Mask field of the Status register and the Interrupt Pending field of the 
Cause register. For more information, refer to the Interrupt Pending 
(IF) field of the Cause register. IM[1:0] are the masks for the two soft- 
ware interrupts while IM[7:2] correspond to Int[5:0]. 

• The 4-bit Coprocessor Usability {CV) field controls the usability of 4 
possible coprocessors. Regardless of the CUO bit setting, CPO is al- 
ways usable in Kernel mode. For all other cases, an instruction for or 
access to an unusable coprocessor causes an exception. 

• The 9-bit Diagnostic Status (DS) field (Status[24:16]) is used for self- 
testing, and checks the cache and virtual memory system. 

• The Reverse-Endian (RE) bit, bit 25, reverses the endianness of the 
machine. The processor can be configured as either little-endian or 
big-endian at system reset. This selection is always used in Kernel 
and Supervisor modes, and also in User mode when the RE bit is 0. 
Setting the RE bit to 1 inverts the User mode endianness. 

Status Register Format 

Figure 5.5 shows the format of the Status register. Table 5.3, which 
follows the figure, describes the Status register fields. 
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Figure 5.5 Status Register 
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Field 


Description 


CU 


Controls the usability of each of the four coprocessor unit numbers. CPO is always usable 
when in Kernel mode, regardless of the setting of the CUq bit. 
1 -> usable -> unusable 


FR 


Enables additional floating-point registers 

-> 16 registers 1 -> 32 registers 


RE 


Reverse-Endian bit, valid in User mode. 


BEV 


Controls the location of TLB refill and general exception vectors. 
-» normal 1 -» bootstrap 


SR 


l-» Indicates a soft reset or NMI has occurred. 


CH 


Hit (tag match and valid state) or miss indication for last CACHE Hit Invalidate, Hit Write 
Back Invalidate, Hit Write Back, or Hit Set Virtual for a primary cache. 
-» miss 1 -> hit 


CE 


Contents of the ECC register set or modify the check bits of the caches when CE = 1 ; see 
description of the ECC register. 


DE 


Specifies that cache parity errors cannot cause exceptions. 

-» parity remains enabled 1 -> disables parity 





Reserved. Must be written as zeroes, and returns zeroes when read. 


IM 


Interrupt Mask: controls the enabling of each of the external, internal, and software inter- 
rupts. An interrupt is taken if interrupts are enabled, and the corresponding bits are set in 
both the Interrupt Mask field of the Status register and the Interrupt Pending field of the Cause 
register. IM[7:2] correspond to interrupts Int[5:0] and IM[1:01 to the software interrupts. 
-» disabled l-» enabled 


KX 


KX controls whether the TLB Refill Vector or the XTLB Refill Vector address is used for TLB 
misses on kernel addresses 

-> TLB Refill Vector 1 -> XTLB Refill Vector 


SX 


Enables 64-bit virtual addressing and operations in Supervisor mode. The extended-address- 
ing TLB refill exception is used for TLB misses on supervisor addresses. 
-» 32-bit 1 -> 64-bit 


UX 


Enables 64-bit virtual addressing and operations in User mode. The extended-addressing TLB 
refill exception is used for TLB misses on user addresses. 
0-> 32-bit 1 -> 64-bit 


KSU 


Mode bits 

10 2 -» User 01 2 -> Supervisor OO2 -» Kernel 


ERL 


Error Level 

-» normal 1 -> error 


EXL 


Exception Level 

-» normal 1 -» exception 
Note: When going from to 1, IE should be disabled (0) first. This would be done when pre- 
paring to return from the exception handler, such as before executing the ERET instruction. 


IE 


Interrupt Enable 

-» disable interrupts 1 -» enables interrupts 



Table 5.3 Status Register Fields 
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Status Register Modes and Access States 

Fields of the Status register set the modes and access states described 
in the sections that follow. 

Interrupt Enable: Interrupts are enabled when all of the following 
conditions are true: 

• JE= 1 

• EXL=0 

• ERL=0 

If these conditions are met, the settings of the IM bits identify the 
interrupt. 

Note: Setting the IE bit may be delayed by up to 3 cycles. If performing 
nested interrupts, re-enable the IE bit first. 

Operating Modes: The following CPU Status register bit settings are 
required for User, Kernel, and Supervisor modes (see Chapter 4 for more 
information about operating modes). 

• The processor is in User mode when KSU= 10 2 , EXL = 0, and ERL = 0. 

• The processor is in Supervisor mode when KSU = 01 2 , EXL = 0, and 
ERL=0. 

• The processor is in Kernel mode when KSU= 00 2 , or EXL = 1, or ERL 

= 1. 
32- and 64-bit Virtual Addressing: The following CPU Status register 
bit settings select 32- or 64-bit virtual addressing for User and Supervisor 
operating modes. Enabling 64-bit virtual addressing permits the execution 
of 64-bit opcodes and translation of 64-bit virtual addresses. 64-bit virtual 
addressing for User and Supervisor modes can be set independently but is 
always used for Kernel mode. 

• The KX field controls whether the TLB Refill Vector or the XTLB Refill 
Vector address is used for TLB misses on Kernel addresses. 64-bit op- 
codes are always valid in Kernel mode. 

• 64-bit addressing and operations are enabled for Supervisor mode 
whenSX= 1. 

• 64-bit addressing and operations are enabled for User mode when UX 
= 1. 

Kernel Address Space Accesses: Access to the kernel address space is 
allowed when the processor is in Kernel mode. 

Supervisor Address Space Accesses: Access to the supervisor address 
space is allowed when the processor is in Kernel or Supervisor mode, as 
described above in the paragraph titled Operating Modes. 

User Address Space Accesses: Access to the user address space is 
allowed in any of the three operating modes. 

Status Register Reset 

The contents of the Status register are undefined at reset, except for the 
following bits — ERL and BEV= 1. 

The SR bit distinguishes between Reset and Soft Reset (Nonmaskable 
Interrupt [NMI]). 
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Cause Register (13) 

The 32-bit read/write Cause register describes the cause of the most 
recent exception. 

Figure 5.6 shows the fields of this register; Table 5.4, which follows the 
figure, describes the Cause register fields. A 5-bit exception code (EbecCode) 
indicates the cause of the most recent exception, as listed in Table 5.5 on 
page 5-8. 

All bits in the Cause register, with the exception of the IP(1:0) bits, are 
read-only; IP(1:0) are used for software interrupts. 





31 30 29 28 27 


Cause Register 

16 15 




8 7 6 2 10 




BD 





CE 


« 


_tf 


IP 





Exc 

Code 


\ 




1 1 2 


12 


8 


1 5 2 



Figure 5.6 Cause Register Format 



Field 


Description 


BD 


Indicates whether the last exception taken occurred in a branch delay slot. 
1 -» delay slot 
-> normal 


CE 


Coprocessor unit number referenced when a Coprocessor Unusable excep- 
tion is taken. 


IP 


Indicates an interrupt is pending. 
1 -> interrupt pending 
-» no interrupt 


ExcCode 


Exception code field (see Table 5.5 on page 5-8) 





Reserved. Must be written as zeroes, and returns zeroes when read. 



Table 5.4 Cause Register Fields 
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Exception 

Code 
Value 


Mnemonic 


Description 





Int 


Interrupt 


1 


Mod 


TLB modification exception 


2 


TLBL 


TLB exception (load or instruction fetch) 


3 


TLBS 


TLB exception (store) 


4 


AdEL 


Address error exception (load or instruction fetch) 


5 


AdES 


Address error exception (store) 


6 


IBE 


Bus error exception (instruction fetch) 


7 


DBE 


Bus error exception (data reference: load or store) 


8 


Sys 


Syscall exception 


9 


Bp 


Breakpoint exception 


10 


RI 


Reserved instruction exception 


11 


CpU 


Coprocessor Unusable exception 


12 


Ov 


Arithmetic Overflow exception 


13 


Tr 


Trap exception 


14 


— 


Reserved 


15 


FPE 


Floating-Point exception 


16-31 


— 


Reserved 



Table 5.5 Cause Register ExcCode Field 

Exception Program Counter (EPC) Register (14) 

The Exception Program Counter (EPQ is a read/write register that 
contains the address at which processing resumes after an exception has 
been serviced. 

For synchronous exceptions, the EPC register contains either: 

• the virtual address of the instruction that was the direct cause of the 
exception, or 

• the virtual address of the immediately preceding branch or jump In- 
struction (when the instruction is in a branch delay slot, and the 
Branch Delay bit in the Cause register is set). 

The processor does not write to the EPC register when the EXLbit in the 
Status register is set to a 1. 

Figure 5.7 shows the format of the EPC register. 




Figure 5.7 EPC Register Format 
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XContext Register (20) 

The read/write XContext register contains a pointer to an entry in the 
page table entry (PTE) array, an operating system data structure that 
stores virtual-to-physical address translations. When there is a TLB miss, 
the operating system software loads the TLB with the missing translation 
from the PTE array. The XContext register duplicates some of the 
information provided in the BadVAddr register, and puts it in a form useful 
for a software TLB exception handler. 

The XContext register is for use with the XTLB refill handler, which loads 
TLB entries for references to a 64-bit address space, and is included solely 
for operating system use. The operating system sets the PTE base field in 
the register, as needed. Normally, the operating system uses the XContext 
register to address the current page map, which resides in the kernel- 
mapped segment kseg3. 

Figure 5.8 shows the format of the XContext register; Table 5.6, which 
follows the figure, describes the XContext register fields. 
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Figure 5.8 XContezt Register Format 

The 27-bit BadVPN2 field has bits 39:13 of the virtual address that 
caused the TLB miss; bit 12 is excluded because a single TLB entry maps 
to an even-odd page pair. For a 4-Kbyte page size, this format may be used 
directly to address the pair-table of 8-byte PTEs. For other page and PTE 
sizes, shifting and masking this value produces the appropriate address. 



Field 


Description 


BadVPN2 


The Bad Virtual Page Number/2 field is written by hardware on a 
miss. It contains the VPN of the most recent invalidly translated vir- 
tual address. 


R 


The Region field contains bits 63:62 of the virtual address. 

00 2 = user 

01 2 = supervisor 

1 1 2 = kernel. 


PTEBase 


The Page Table Entry Base read/write field is normally written with 
a value that allows the operating system to use the Context register 
as a pointer into the current PTE array in memory. 



Table 5.6 XContezt Register Fields 

Error Checking and Correcting (ECC) Register (26) 

The 8-bit Error Checking and Correcting {ECQ register reads or writes 
primaiy-cache data parity bits for cache initialization, cache diagnostics, 
or cache error processing. (Tag parity is loaded from and stored to the 
TagLo register.) 

The ECC register is loaded by the Index Load Tag CACHE operation. 
Content of the ECC register is: 

• written into the primary data cache on store instructions (instead of 
the computed parity) when the CE bit of the Status register is set 

• substituted for the computed instruction parity for the CACHE oper- 
ation Fill 

To force a cache parity value use the Status CE bit and the ECC register. 
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Figure 5.9 shows the format of the ECC register; Table 5.7, which follows 
the figure, describes the register fields. 
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Figure 5.9 ECC Register Format 







Field 


Description 


ECC 


An 8-bit field specifying the parity bits read from or 
written to a primary cache. 





Reserved. Must be written as zeroes, and returns 
zeroes when read. 



Table 5.7 ECC Register Fields 

Cache Error (CacheErr) Register (27) 

The 32-bit read-only CacheErr register processes parity errors in the 
primary cache. Parity errors cannot be corrected. 

The CacheErr register holds cache index and status bits that indicate 
the source and nature of the error; it is loaded when a Cache Error 
exception is asserted. When a read response returns with bad parity this 
exception is also asserted. 

Figure 5.10 shows the format of the CacheErr register; , which follows 
the figure, describes the CacheErr register fields. 
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Field 


Description 


ER 


Type of reference 

— » instruction 

1 -> data 


EC 


Cache level of the error 

-> primary 

1 -» reserved 


ED 


Indicates if a data field error occurred 

-> no error 

1 -» error 


ET 


Indicates if a tag field error occurred 

— > no error 

1 —> error 


ES 


Indicates the error occurred accessing processor-managed resources, in response to an external 
request. 

-> internal reference 

1 -> external reference 

Since the R4600/R4700 doesn't have any external events that would look in a cache (which is 
the only processor-managed resource), this bit would not be set under normal operating 
conditions. 


EE 


Set if the error occurred on the SysAD bus. 
Taking a cache error exception sets /clears this bit. 


EB 


Set if a data error occurred in addition to the instruction error (indicated by the remainder of 
the bits). If so, this requires flushing the data cache after fixing the instruction error. 


SIdx 


Physical address 21:3 of the reference that encountered the error. 

The address may not be the same as the address of the double word in error, but it is sufficient 
to locate that double word in the secondary cache. 


PIdx 


Virtual address 13:12 of the double word in error. 

To be used with SIdx to construct a virtual index for the primary caches. Only the lower two 
bits (bits 1 and 0) are vAddr; the high bit (bit 2) is zero. 





Reserved. Must be written as zeroes, and returns zeroes when read. 



Table 5.8 CacheErr Register Fields 

Error Exception Program Counter (Error EPC) Register (30) 

The ErrorEPC register is similar to the EPC register, except that ErrorEPC 
is used on parity error exceptions. It is also used to store the program 
counter (PC) on Reset, Soft Reset, and nonmaskable interrupt (NMI) 
exceptions. 

The read/write ErrorEPC register contains the virtual address at which 
instruction processing can resume after servicing an error. This address 
can be: 

• the virtual address of the instruction that caused the exception 

• the virtual address of the immediately preceding branch or jump in- 
struction, when this address is in a branch delay slot. 

There is no branch delay slot indication for the ErrorEPC register. 
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Figure 5.11 shows the format of the ErrorEPC register. 
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Figure 5.11 ErrorEPC Register Format 



Processor Exceptions 

This section describes the processor exceptions — it describes the cause 
of each exception, its processing by the hardware, and servicing by a 
handler (software). The types of exception, with exception processing 
operations, are described in the next section. 

Exception Types 

This section gives sample exception handler operations for the following 
exception types: 

• reset 

• soft reset 

• nonmaskable interrupt (NMI) 

• cache error 

• remaining processor exceptions 

When the EXL bit in the Status register is 0, either User or Supervisor 
operating mode is specified by the KSU bits in the Status register. When 
the EXL bit or the ERL bit is a 1, the processor is in Kernel mode. 

When the processor takes an exception, the EXL bit is set to 1, which 
means the system is in Kernel mode. After saving the appropriate state, the 
exception handler typically resets the EXL bit back to 0. When restoring 
the state and restarting, the handler sets the EXL bit back to 1 . 

Returning from an exception, also resets the EXL bit to (see the ERET 
instruction in Appendix A). 

In the following sections, sample hardware processes for various 
exceptions are shown, together with the servicing required by the handler 
(software). 

Reset Exception Process 

Figure 5.12 shows the Reset exception process. 
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Figure 5.12 Reset Exception Processing 
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Cache Error Exception Process 

Figure 5.13 shows the Cache Error exception process. 
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Figure 5. 13 Cache Error Exception Processing 

Soft Reset and NMI Exception Process 

Figure 5.14 shows the Soft Reset and NMI exception process. 
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Figure 5.14 Soft Reset and NMI Exception Processing 

General Exception Process 

Figure 5. 15 shows the process used for exceptions other than Reset, Soft 
Reset, NMI, and Cache Error. 
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Figure 5. 15 General Exception Processing (Except Reset, Soft Reset, NMI, 

and Cache Error) 

Exception Vector Locations 

The Reset, Soft Reset, and NMI exceptions are always vectored to 
location OxFFFF FFFF BFC0 0000 (virtual address), corresponding to 
ksegO. 

Addresses for all other exceptions are a combination of a vector offset 
and a base address. The base address is determined by the BEV bit of the 
Status register, as shown in Table 5.9. 
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Table 5. 10 shows the vector offset that is added to the base address to 
create the exception address. 



BEV 


R4600/R4700 Processor Vector Base 


Cache Error Base 





OxFFFFFFFF 8000 0000 


OxFFFFFFFFAOOOOOOO 


1 


0xFFFFFFFFBFC00200 


OxFFFF FFFF BFC0 0200 



Table 5.9 Exception Vector Base Addresses 

As shown in Table 5.9, when BEV = O, the vector base for the Cache 
Error exception changes from ksegO (OxFFFF FFFF 8000 0000) to ksegl 
(OxFFFF FFFF AOOO 0000). 

When BEV = 1, the vector base for the Cache Error exception is OxFFFF 
FFFF BFCO 0200. This is an uncached and unmapped space, allowing the 
exception to bypass the cache and TLB. 



Exception 


R4600/R4700 Processor 
Vector Offset 


TLB refill, EXL = 


0x000 


XTLB refill, EXL = (X = 64-bit TLB) 


0x080 


Cache Error 


0x100 


Others 


0x180 



Table 5. 10 Exception Vector Offsets 

Priority of Exceptions 

The remainder of this chapter describes exceptions in the order of their 
priority, as shown in Table 5. 1 1. While more than one exception can occur 
for a single instruction, only the exception with the highest priority is 
reported. 



Exception Priority 


1 


Reset (highest priority) 


9 


Integer overflow, Trap, System Call, Break- 
point, Reserved Instruction, Coprocessor 
Unusable, or Floating-Point Exception 


2 


Soft Reset 


10 


Address error — Data access 


3 


Nonmaskable Interrupt (NMI) 


11 


TLB refill — Data access 


4 


Address error — Instruction fetch 


12 


TLB invalid — Data access 


5 


TLB refill — Instruction fetch 


13 


TLB modified — Data write 


6 


TLB invalid — Instruction fetch 


14 


Cache error — Data access 


7 


Cache error — Instruction fetch 


15 


Bus error — Data access 


8 


Bus error — Instruction fetch 


16 


Interrupt (lowest priority) 



Table 5.11 Exception Priority Order 

Generally speaking, the exceptions described in the following sections 
are handled ("processed") by hardware; these exceptions are then serviced 
by software. 
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Reset Exception 

This section explains the Reset exception. 
Cause 

The Reset exception occurs when the ColdReset* 1 signal is asserted and 
then deasserted. This exception is not maskable. 

Processing 

The CPU provides a special exception vector for this exception of: 
OxFFFF FFFF BFCO 0000 

The Reset vector resides in unmapped and uncached CPU address 
space, so the hardware need not initialize the TLB or the cache to process 
this exception. It also means the processor can fetch and execute 
instructions while the caches and virtual memory are in an undefined 
state. 

The contents of all registers in the CPU are undefined when this 
exception occurs, except for the following register fields: 

• In the Status register, SR is cleared to 0, and ERL and BEV are set to 
1 . All other bits are undefined. 

• The Random register is initialized to the value of its upper bound. 

• The Wired register is initialized to 0. 

• Some of the Conflg Register bits are initialized from the boot-time 
mode stream. 

Reset exception processing is shown in Figure 5.12 on page 12. 

Servicing 

The Reset exception is serviced by: 

• initializing all processor registers, coprocessor registers, caches, and 
the memory system 

• performing diagnostic tests 

• bootstrapping the operating system 



l ' In the following sections (and throughout this manual) a signal followed by an 
asterisk, such as Reset*, is low active. 

5-15 



CPU Exception Processing Chapter 5 

Soft Reset Exception 

This section explains the Soft Reset exception. 

Cause 

The Soft Reset exception occurs in response to the Reset* input signal, 
and execution begins at the Reset vector when Reset* is deasserted. This 
exception is not maskable. 

Processing 

The Reset exception vector is used for this exception, located within 
unmapped and uncached address space so that the cache and TLB need 
not be initialized to process this exception. When a Soft Reset occurs, the 
SR bit of the Status register is set to distinguish this exception from a Reset 
exception. 

The primary purpose of the Soft Reset exception is to reinitialize the 
processor after a fatal error during normal operations. Unlike an NMI, all 
cache and bus state machines are reset by this exception. Like Reset, it 
can be used on the processor in any state; the caches, TLB, and normal 
exception vectors need not be properly initialized. Soft Reset preserves the 
state of the caches and memory system, while resetting the bus state and 
cache state machine. 

When this exception occurs, the contents of all registers are preserved 
except for: 

• ErrorEPC register, which contains the restart PC 

• EEL bit of the Status register, which is set to 1 

• SR bit of the Status register, which is set to 1 

• BEV bit of the Status register, which is set to 1 

Because the Soft Reset can abort cache and bus operations, cache and 
memory state is undefined when this exception occurs. 

Soft reset exception processing is shown in Figure 5. 14 on page 13. 

Servicing 

The Soft Reset exception is serviced by saving the current processor 
state for diagnostic purposes, and reinitializing for the Reset exception. 
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Nonmaskable Interrupt (NMI) Exception 

This section explains the Nonmaskable Interrupt exception. 

Cause 

The Nonmaskable Interrupt (NMI) exception occurs in response to the 
falling edge of the NMI pin, or an external write to the Int*[6] bit of the 
Interrupt register. 

Unlike all other interrupts, this interrupt is not maskable; it occurs 
regardless of the settings of the EXL, ERL, and the IE bits in the Status 
register. 

Processing 

The Reset exception vector is used for this exception. This vector is 
located within unmapped and uncached address space so that the cache 
and TLB need not be initialized to process an NMI interrupt. When an NMI 
exception occurs, the SR bit of the Status register is set to differentiate this 
exception from a Reset exception. 

Because an NMI can occur in the midst of another exception, it is not 
normally possible to continue program execution after servicing an NMI. 

Unlike Reset and Soft Reset, but like other exceptions, NMI is taken only 
at instruction boundaries. The state of the caches and memory system are 
preserved by this exception. 

To terminate a pending read that has hung the best approach is to 
return a bus error. However, if you wish to use a CPU exception to indicate 
a hung read, Soft Reset is preferable to NMI. 

When this exception occurs, the contents of all registers are preserved 
except for: 

• ErrorEPC register, which contains the restart PC 

• ERL bit of the Status register, which is set to 1 

• SR bit of the Status register, which is set to 1 

• BEV bit of the Status register, which is set to 1 

NMI exception processing is shown in Figure 5.14 on page 13. 

Servicing 

The NMI exception is serviced by saving the current processor state for 
diagnostic purposes, and reinitializing the system for the Reset exception. 
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Address Error Exception 

This section explains the Address Error exception. 

Cause 

The Address Error exception occurs when an attempt is made to execute 
one of the following: 

• load or store a doubleword that is not aligned on a doubleword 
boundary (except for use of special instruction) 

• load, fetch, or store a word that is not aligned on a word boundary 
(except for use of special instruction) 

• load or store a halfword that is not aligned on a halfword boundary 

• reference the kernel address space from User or Supervisor mode 

• reference the supervisor address space from User mode 
This exception is not maskable. 

Processing 

The common exception vector is used for this exception. The AdEL or 
AdES code in the Cause register is set, indicating whether the instruction 
(shown by the EPC register and BD bit in the Cause register) caused the 
exception with an instruction reference, load operation, or store operation. 

When this exception occurs, the BadVAddr register retains the virtual 
address that was not properly aligned or referenced protected address 
space. The contents of the VPN field of the Context and EntryHi registers 
are undefined, as are the contents of the EntryLo register. 

The EPC register contains the address of the instruction that caused the 
exception, unless this instruction is in a branch delay slot. If it is in a 
branch delay slot, the EPC register contains the address of the preceding 
branch instruction and the BD bit of the Cause register is set as indication. 

Address Error exception processing is shown in Figure 5. 15 on page 13. 

Servicing 

Typically the process executing at the time is handed a segmentation 
violation signal. This error is usually fatal to the process incurring the 
exception. 

To resume execution, the EPC register must be altered so that the 
unaligned reference instruction does not re-execute; this is accomplished 
by adding a value of 4 to the EPC register {EPC register + 4) before 
returning. 

If an unaligned reference instruction is in a branch delay slot, 
interpretation of the branch instruction is required to resume execution. 
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TLB Exceptions 

This section explains the TLB Exceptions. For specifics on the 
exceptions listed here, refer to the following three subsections. 
Three types of TLB exceptions can occur: 

• TLB Refill occurs when there is no TLB entry that matches an at- 
tempted reference to a mapped address space. 

• TLB Invalid occurs when a virtual address reference matches a TLB 
entiy that is marked invalid. 

• TLB Modified occurs when a store operation virtual address reference 
to memory matches a TLB entry which is marked valid but is not dirty 
(the entry is not writable). 

The following three subsections describe the TLB exceptions. 

TLB Refill Exception 

This subsection explains the TLB refill exception. 

Cause 

The TLB refill exception occurs when there is no TLB entry to match a 
reference to a mapped address space. This exception is not maskable. 

Processing 

There are two special exception vectors for this exception; one for 
references to 32-bit virtual address spaces, and one for references to 64- 
bit virtual address spaces. The UX, SX, and KXbits of the Status register 
determine whether the user, supervisor or kernel address spaces 
referenced are 32-bit or 64-bit spaces. All references use these vectors 
when the EXL bit is set to in the Status register. This exception sets the 
TLBL or TLBS code in the ExcCode field of the Cause register. This code 
indicates whether the instruction, as shown by the EPC register and the 
BD bit in the Cause register, caused the miss by an instruction reference, 
load operation, or store operation. 

When this exception occurs, the BadVAddr, Context, XContext and 
EntryHi registers hold the virtual address that failed address translation. 
The EntryHi register also contains the ASID from which the translation 
fault occurred. The Random register normally suggests a valid location in 
which to place the replacement TLB entry. The contents of the EntryLo 
register are undefined. The EPC register contains the address of the 
instruction that caused the exception, unless this instruction is in a 
branch delay slot, in which case the EPC register contains the address of 
the preceding branch instruction and the BD bit of the Cause register is 
set. 

TLB Refill exception processing is shown in Figure 5. 15 on page 13. 

Servicing 

To service this exception, the contents of the Context or XContext register 
are used as a virtual address to fetch memory locations containing the 
physical page frame and access control bits for a pair of TLB entries. The 
two entries are placed into the EntryLoO/EntryLol register; the EntryHi 
and EntryLo registers are written into the TLB. 

It is possible that the virtual address used to obtain the physical address 
and access control information is on a page that is not resident in the TLB. 
This condition is processed by allowing a TLB refill exception in the TLB 
refill handler. This second exception goes to the common exception vector 
because the EXL bit of the Status register is set. 
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TLB Invalid Exception 

This subsection explains the TLB invalid exception. 

Cause 

The TLB invalid exception occurs when a virtual address reference 
matches a TLB entry that is marked invalid (TLB valid bit cleared). This 
exception is not maskable. 

Processing 

The common exception vector is used for this exception. The TLBL or 
TLBS code in the ExcCode field of the Cause register is set. This indicates 
whether the instruction, as shown by the EPC register and BD bit in the 
Cause register, caused the miss by an instruction reference, load 
operation, or store operation. 

When this exception occurs, the BadVAddr, Context, XContext and 
EntryHi registers contain the virtual address that failed address 
translation. The EntryHi register also contains the ASID from which the 
translation fault occurred. The Random register normally contains a valid 
location in which to put the replacement TLB entry. The contents of the 
EntryLo registers are undefined. 

The EPC register contains the address of the instruction that caused the 
exception unless this instruction is in a branch delay slot, in which case 
the EPC register contains the address of the preceding branch instruction 
and the BD bit of the Cause register is set. 

TLB Invalid exception processing is shown in Figure 5.15 on page 13. 

Servicing 

A TLB entry is typically marked invalid when one of the following is true: 

• a virtual address does not exist 

• the virtual address exists, but is not in main memory (a page fault) 

• a trap is desired on any reference to the page (for example, to main- 
tain a reference bit or during debug) 

After servicing the cause of a TLB Invalid exception, the TLB entry is 
located with TLBP (TLB Probe), and replaced by an entry with that entry's 
Valid bit set. 
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TLB Modified Exception 

This subsection explains the TLB modified exception. 

Cause 

The TLB modified exception occurs when a store operation virtual 
address reference to memory matches a TLB entry that is marked valid but 
is not dirty and therefore is not writable. This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the Mod 
code in the Cause register is set. 

When this exception occurs, the BadVAddr, Context, XContext and 
EntryHi registers contain the virtual address that failed address 
translation. The EntryHi register also contains the ASID from which the 
translation fault occurred. The contents of the EntryLo registers are 
undefined. 

The EPC register contains the address of the instruction that caused the 
exception unless that instruction is in a branch delay slot, in which case 
the EPC register contains the address of the preceding branch instruction 
and the BD bit of the Cause register is set. 

TLB Modified exception processing is shown in Figure 5.15 on page 13. 

Servicing 

The kernel uses the failed virtual address or virtual page number to 
identify the corresponding access control information. The page identified 
may or may not permit write accesses; if writes are not permitted, a write 
protection violation occurs. 

If write accesses are permitted, the page frame is marked dirty /writable 
by the kernel in its own data structures. The TLBP instruction places the 
index of the TLB entry that must be altered into the Index register. The 
EntryLo register is loaded with a word containing the physical page frame 
and access control bits (with the D bit set), and the EntryHi and EntryLo 
registers are written into the TLB. 
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Cache Error Exception 

This section explains the Cache Error exception. 

Cause 

The Cache Error exception occurs when a primary cache parity error is 
detected. This exception is maskable by the DE bit of the Status register. 

Processing 

The processor sets the ERLbit in the Status register, saves the exception 
restart address in ErrorEPC register, and then transfers to a special vector 
in uncached space: 

If the BEV bit = 0, the vector is OxFFFF FFFF A000 0100. 
If the BEV bit = 1, the vector is OxFFFF FFFF BFCO 0300. 
No other registers are changed. 
Cache Error exception processing is shown in Figure 5. 13 on page 13. 

Servicing 

All errors should be logged. To correct cache parity errors the system 
uses the CACHE instruction to invalidate the cache block, overwrites the 
old data through a cache miss, and resumes execution with an ERET. 

Other errors are not correctable and are likely to be fatal to the current 
process. 
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Bus Error Exception 

This section explains the Bus Error exception. 

Cause 

A Bus Error exception is raised by board-level circuitry for events such 
as bus time-out, backplane bus parity errors, and invalid physical memory 
addresses or access types. This exception is not maskable. 

A Bus Error exception occurs only when a cache miss refill, uncached 
reference, or unbuffered write occurs synchronously; a Bus Error 
exception resulting from a buffered write transaction must be reported 
using the general interrupt mechanism. 

Processing 

The common interrupt vector is used for a Bus Error exception. The IBE 
or DBE code in the ExcCode field of the Cause register is set, signifying 
whether the instruction (as indicated by the EPC register and BD bit in the 
Cause register) caused the exception by an instruction reference, load 
operation, or store operation. 

The EPC register contains the address of the instruction that caused the 
exception, unless it is in a branch delay slot, in which case the EPC register 
contains the address of the preceding branch instruction and the BD bit of 
the Cause register is set. Bus Error processing is shown in Figure 5. 15 on 
page 13. 

Servicing 

The physical address at which the fault occurred can be computed from 
information available in the CPO registers. 

• If the IBE code in the Cause register is set (indicating an instruction 
fetch reference), the virtual address is contained in the EPC register. 

• If the DBE code is set (indicating a load or store reference), the in- 
struction that caused the exception is located at the virtual address 
contained in the EPC register (or 4+ the contents of the EPC register 
if the BD bit of the Cause register is set). 

The virtual address of the load and store reference can then be obtained 
by interpreting the instruction. The physical address can be obtained by 
using the TLBP instruction and reading the EntryLo register to compute 
the physical page number. 

The process executing at the time of this exception is handed a bus error 
signal, which is usually fatal. 
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Integer Overflow Exception 

This section explains the Integer Overflow exception. 

Cause 

An Integer Overflow exception occurs when an ADD, ADDI, SUB, DADD, 
DADDI or DSUB 1 instruction results in a 2's complement overflow. This 
exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the OV 
code in the Cause register is set. 

The EPC register contains the address of the instruction that caused the 
exception unless the instruction is in a branch delay slot, in which case 
the EPC register contains the address of the preceding branch instruction 
and the BD bit of the Cause register is set. 

Integer Overflow exception processing is shown in Figure 5.15 on 
page 13. 

Servicing 

The process executing at the time of the exception is handed a floating- 
point exception/integer overflow signal. This error is usually fatal to the 
current process. 



L See Appendix A for instruction description. 
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Trap Exception 

This section explains the Trap exception. 

Cause 

The Trap exception occurs when a TGE, TGEU, TLT, TLTU, TEQ, TNE, 

TGEI, TGEUI, TLTI, TLTUI, TEQI, or TNEI 1 instruction results in a TRUE 
condition. This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the Tr code 
in the Cause register is set. 

The EPC register contains the address of the instruction causing the 
exception unless the instruction is in a branch delay slot, in which case 
the EPC register contains the address of the preceding branch instruction 
and the BD bit of the Cause register is set. 

Trap exception processing is shown in Figure 5.15 on page 13. 

Servicing 

The process executing at the time of a Trap exception is handed a 
floating-point exception/integer overflow signal. This error is usually fatal. 



1# See Appendix A for instruction description. 
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System Call Exception 

This section explains the System Call exception. 

Cause 

A System Call exception occurs during an attempt to execute the 
SYSCALL instruction. This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the Sys 
code in the Cause register is set. 

The EPC register contains the address of the SYSCALL instruction 
unless it is in a branch delay slot, in which case the EPC register contains 
the address of the preceding branch instruction. 

If the SYSCALL instruction is in a branch delay slot, the BD bit of the 
Status register is set; otherwise this bit is cleared. 

System Call exception processing is shown in Figure 5. 15 on page 13. 

Servicing 

When this exception occurs, control is transferred to the applicable 
system routine. 

To resume execution, the EPC register must be altered so that the 
SYSCALL instruction does not re-execute; this is accomplished by adding 
a value of 4 to the EPC register (EPC register + 4) before returning. 

If a SYSCALL instruction is in a branch delay slot, a more complicated 
algorithm, beyond the scope of this description, may be required. 
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Breakpoint Exception 

This section explains the Breakpoint exception. 

Cause 

A Breakpoint exception occurs when an attempt is made to execute the 
BREAK instruction. This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the BP code 
in the Cause register is set. 

The EPC register contains the address of the BREAK instruction unless 
it is in a branch delay slot, in which case the EPC register contains the 
address of the preceding branch instruction. 

If the BREAK instruction is in a branch delay slot, the BD bit of the 
Status register is set, otherwise the bit is cleared. 

Breakpoint exception processing is shown in Figure 5.15 on page 13. 

Servicing 

When the Breakpoint exception occurs, control is transferred to the 
applicable system routine. Additional distinctions can be made by 
analyzing the unused bits of the BREAK instruction (bits 25:6), and 
loading the contents of the instruction whose address the EPC register 
contains. A value of 4 must be added to the contents of the EPC register 
(EPC register + 4) to locate the instruction if it resides in a branch delay 
slot. 

To resume execution, the EPC register must be altered so that the 
BREAK instruction does not re-execute; this is accomplished by adding a 
value of 4 to the EPC register {EPC register + 4) before returning. 

If a BREAK instruction is in a branch delay slot, interpretation of the 
branch instruction is required to resume execution. 
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Reserved Instruction Exception 

This section explains the Reserved Instruction exception. 

Cause 

The Reserved Instruction exception occurs when one of the following 
conditions occurs: 

• an attempt is made to execute an instruction with an undefined major 
opcode (bits 31:26) 

• an attempt is made to execute a SPECIAL instruction with an unde- 
fined minor opcode (bits 5:0) 

• an attempt is made to execute a REGIMM instruction with an unde- 
fined minor opcode (bits 20:16) 

• an attempt is made to execute 64-bit operations in 32-bit virtual ad- 
dressing when in User or Supervisor modes 

64-bit operations are always valid in Kernel mode regardless of the value 
of the KXbit in the Status register. 

This exception is not maskable. 

Reserved Instruction exception processing is shown in Figure 5.15 on 
page 13. 

Processing 

The common exception vector is used for this exception, and the RI code 
in the Cause register is set. 

The EPC register contains the address of the reserved instruction unless 
it is in a branch delay slot, in which case the EPC register contains the 
address of the preceding branch instruction. 

Servicing 

No instructions in the MIPS ISA are currently interpreted. The process 
executing at the time of this exception is handed an illegal instruction/ 
reserved operand fault signal. This error is usually fatal. 
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Coprocessor Unusable Exception 

This sections explains the Coprocessor Unusable exception. 

Cause 

The Coprocessor Unusable exception occurs when an attempt is made 
to execute a coprocessor instruction for either: 

• a corresponding coprocessor unit that has not been marked usable, 
or 

• CPO instructions, when the unit has not been marked usable and the 
process executes in User mode. 

This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the CPU 
code in the Cause register is set. The contents of the Coprocessor Usage 
Error field of the coprocessor Control register indicate which of the four 
coprocessors was referenced. The EPC register contains the address of the 
unusable coprocessor instruction unless it is in a branch delay slot, in 
which case the EPC register contains the address of the preceding branch 
instruction. 

Coprocessor Unusable exception processing is shown in Figure 5. 15 on 
page 13. 

Servicing 

The coprocessor unit to which an attempted reference was made is 
identified by the Coprocessor Usage Error field, which results in one of the 
following situations: 

• If the process is entitled access to the coprocessor, the coprocessor is 
marked usable and the corresponding user state is restored to the co- 
processor. 

• If the process is entitled access to the coprocessor, but the coproces- 
sor does not exist or has failed, interpretation of the coprocessor in- 
struction is possible. 

• If the BD bit is set in the Cause register, the branch instruction must 
be interpreted; then the coprocessor instruction can be emulated and 
execution resumed with the EPC register advanced past the coproces- 
sor instruction. 

• If the process is not entitled access to the coprocessor, the process ex- 
ecuting at the time is handed an illegal instruction/privileged instruc- 
tion fault signal. This error is usually fatal. 
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Floating-Point Exception 

This sections explains the Floating-Point exception. 

Cause 

The Floating-Point exception is used by the floating-point coprocessor. 
This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the FPE 
code in the Cause register is set. 

The contents of the Floating-Point Control/ Status register indicate the 
cause of this exception. 

Floating-Point exception processing is shown in Figure 5. 15 on page 13. 

Servicing 

This exception is cleared by clearing the appropriate bit in the Floating- 
Point Control/ Status register. 

For an unimplemented instruction exception, the kernel should emulate 
the instruction; for other exceptions, the kernel should pass the exception 
to the user program that caused the exception. 
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Interrupt Exception 

This sections explains the Interrupt exception. 

Cause 

The Interrupt exception occurs when one of the eight interrupt 
conditions is asserted. The significance of these interrupts is dependent 
upon the specific system implementation. 

Each of the eight interrupts can be masked by clearing the 
corresponding bit in the Int-Mask field of the Status register, and all of the 
eight interrupts can be masked at once by clearing the IE bit of the Status 
register. 

Processing 

The common exception vector is used for this exception, and the Int code 
in the Cause register is set. 

The IP field of the Cause register indicates current interrupt requests. It 
is possible that more than one of the bits can be simultaneously set (or 
even no bits may be set if the interrupt is asserted and then deasserted 
before this register is read). 

Interrupt exception processing is shown in Figure 5.15 on page 13. 

Servicing 

If the interrupt is caused by one of the two software-generated 
exceptions {SW1 or SWO), the interrupt condition is cleared by setting the 
corresponding Cause register bit to 0. 

If the interrupt is hardware-generated, the interrupt condition is cleared 
by correcting the condition causing the interrupt pin to be asserted. 

NOTE: due to the write buffer, a store to an external device will not 
necessarily occur until after other instructions in the pipeline finish. Thus, 
the user must ensure that the store will occur before the return from 
exception instruction (ERET) is executed otherwise the interrupt may be 
serviced again even though there should be no interrupt pending. 
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Exception Handling and Servicing Flowcharts 

The remainder of this chapter contains figures of flowcharts for the 
exceptions described in Table 5.12, and guidelines for their handlers. 



Figure 


Description 


Figure 5.16, 
Figure 5.17 


General exceptions and their exception handler 


Figure 5. 18, 
Figure 5.19 


TLB/XTLB miss exception and their exception handler 


Figure 5.20 


Cache error exception and its handler 


Figure 5.21 


Reset, soft reset and NMI exceptions, and a guideline to 
their handler. 



Table 5.12 List of Exception Flowcharts 

Generally speaking, the exceptions are handled by hardware (HW), and 
then the exceptions are serviced by software (SW). 
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Set FP Control Status Register 
Enhi <r- VPN2, ASID 
Context <- VPN2 
Set Cause Register 
EXCCode, CE 



Comments 



*FP Control Status Register is only set 
if the respective exception occurs. 
EnHi, X/Context are set only for 
TLB- Invalid, Modified, 
& Refill exceptions 



Yes 



Cause 31 (BD) <- 1 




Cause 31 (BD) <- 




Check if exception within 
another exception 



=1 



Set BadVA 
EPC*-(PC-4) 



Set BadVA 
EPC <- PC 



-►< 



BadVA is set only for 
TLB- Invalid, Modified, 
Refill- and VCED/I exceptions 
Note: not set if Bus Error 
Exception 



EXL<-1 



=0 (normal) 




Processor forced to Kernel Mode 
& interrupt disabled 



=1 (bootstrap) 



(Base is sign extended for 64 bits) 



PC «- OxFFFF FFFF 8000 0000 

+ 180 

(unmapped, cached) 



PC 4- OxFFFF FFFF BFCO 0200 

+ 180 

(unmapped, uncached) 



-*T*- 



J 



T 

To General Exception Servicing Guidelines 



Exceptions other than Reset, Soft Reset, NMI, CacheErr or first-level TLB miss 
Note: Interrupts can be masked by IE or IMs 



Figure 5.16 General Exception Handler (HW) 
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MFCO- 

X/Context 
EPC 
Status 
Cause 



V 


MTCO- 
(Set Status Bits:) 

KSU <r- 00 

EXL4-0 
&IE=1 



Comments 



* Unmapped vector so TLBMod, TLBInv, 

TLB Refill exceptions not possible 

* EXL=1 so Interrupt exceptions disabled 

* OS/System to avoid all other exceptions 

*Only CacheErr, Reset, Soft Reset, NMI 
exceptions possible. 



(optional - only to enable Interrupts while keeping Kernel Mode) 



Check CAUSE REG. & Jump to 
appropriate Service Code 



* After EXL=0, all exceptions allowed, 
(except interrupt if masked by IE or I M 
and CacheErr if masked by DE) 



Service Code 



EXL=1 



MTC0- 
EPC 
STATUS 



ERET 



* ERET is not allowed in the branch delay slot of 
another Jump Instruction 

* Processor does not execute the instruction which is 
in the ERETs branch delay slot 

*PC<-EPC;EXL<-0 
*LLbit4-0 



Figure 5.17 General Exception Servicing Guidelines (SW) 
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Enhi <- VPN2, ASID 
Context <-VPN2 
Set Cause Reg. 
EXCCode, CE and 
Cause bit 31 (BD) <- 



Enhi <- VPN2, ASID 
Context 4- VPN2 
Set Cause Reg. 

EXCCode, CE and 
Cause bit 31 (BD)<-0 




Check if exception within 
another exception 



Set BadVA 
EPC <- (PC - 4) 



T 



Set BadVA 

EPC <r- PC 




Vec. Off. = 0x080 



Vec. Off. - 0x000 



Vec. Off. = 0x180 



i . w„ 

Points to Refill Exception I 



EXL4-1 



=0 (normal) 




Points to General Exception 

Processor forced to Kernel Mode & 
interrupt disabled 



=1 (bootstrap) 



(Base is sign extended for 64 bits) 



PC <- OxFFFF FFFF 8000 0000 
+ Vec.Off. 
(unmapped, cached) 



PC <- OxFFFF FFFF BFC0 0200 

+ VecOff. 

(unmapped, uncached) 



>i< 



To TLB/XTLB Exception Servicing Guidelines 



Figure 5.18 TLB/XTLB Miss Exception Handler (HW) 
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MFCO- 

CONTEXT 


yf 


1 Service Code ' 




i 


r 






ERET 





Comments 

* Unmapped vector so TLBMod, TLBInv, 
TLB Refill or VCEP exceptions 

not possible 

* EXL=1 so Interrupt exceptions disabled 

* OS/System to avoid all other exceptions 

*Only CacheErr, Reset, Soft Reset, NMI 
exceptions possible. 



* Load the mapping of the virtual address in Context Reg. 
Move it to ENLO and Write into the TLB 

* There could be a TLB miss again during the mapping 
of the data or instruction address. The processor will 
jump to the general exception vector since the EXL is 1 . 
(Option to complete the first level refill in the general 
exception handler or ERET to the original instruction 
and take the exception again) 



* ERET is not allowed in the branch delay slot of 
another Jump Instruction 

* Processor does not execute the instruction which is 
in the ERET's branch delay slot 

* PC <r- EPC; EXL <- 
*LLbit<-0 



Figure 5.19 TLB/XTLB Exception Servicing Guidelines (SW) 
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Note: Can be masked/disabled by DE (SR16) bit = 1 



Set CacheErr Reg. 



O) 



c 

CO 

x 



o 

LU 



LU 
0) 

sz 
o 

(Q 

O 





Yes 


^^ Instr. in\^ 


u 




<\Br. Dly. Slott/ 
JNo 


ErrEPC +- (PC - 4) 




ErrEPC 4- PC 
















V 




ERL4-1 



=0 (normal) 




=1 (bootstrap) 



(Base is sign extended for 64 bits) 



PC <- OxFFFF FFFF A000 0000 

+ 100 

(unmapped, uncached) 



T 



PC <- OxFFFF FFFF BFC0 0200 

+ 100 

(unmapped, uncached) 



>\< 






<J) 

"5 
O 

D) 

C 

"3 
> 

O 



Service Code 



ERET 



Comments 

* Unmapped Uncached vector so 

TLB related & Cache Error Exception not possible 

* ERL=1 so Interrupt exceptions disabled 

* OS/System to avoid all other exceptions 

*Only Reset, Soft Reset, NMI 
exceptions possible. 



* ERET is not allowed in the branch delay slot of 
another Jump Instruction 

. * Processor does not execute the instruction which is 
^ in the ERET's branch delay slot 

* PC <- ErrorEPC; ERL <- 

* Libit «-0 



Figure 5.20 Cache Error Exception Handling (HW) 
and Servicing Guidelines (SW) 
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X 


Soft Reset or NMI Except 


ion 




I 


Reset Except 


ion 




Status: 




Random <- TLBENTRIES - 1 




c 






Wired <-0 




c 

CO 

X 


BEV<- 1 

SR<-1 

ERL<-1 




Config <- Update(31:6)ll Undef(5:0) 
Status: 




c 






BEV<-1 




.2 






SR<-0 
ERL+-1 




<D 

8 

Ul 










I 

z 










08 










0) 


U 






ErrorEPC <- PC 




O 
0) 




V 












I 




PC <- OxFFFF FFFF BFCO 0000 




GC 




















} 


r 











ii 

4$ 

(A g> 
rr 4) 




Note: There is no indication from the 
processor to differentiate between 
NMI & Soft Reset; 
there must be a system level indication. 



Reset Service Code 



(Optional) 



Figure 5.21 Reset, Soft Reset & NMI Exception Handling (HW) and Servicing 

Guidelines (SW) 
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This chapter describes the R4600 and R4700 floating-point unit (FPU) 
features, including the programming model, instruction set and formats, 
and the pipeline. 

The FPU, with associated system software, fully conforms to the 
requirements of ANSI/IEEE Standard 754-1985, IEEE Standard for Binary 
Floating-Point Arithmetic. In addition, the MIPS architecture fully supports 
the recommendations of the standard and precise exceptions. 

Overview 

The FPU operates as a coprocessor for the CPU (it is assigned 
coprocessor label CPl) y and extends the CPU instruction set to perform 
arithmetic operations on floating-point values. 

The R4600/R4700 Floating-Point Coprocessor 

The R4600/R4700 incorporates an entire floating-point coprocessor on 
chip, including a floating-point register file and execution units. The 
floating-point coprocessor forms a seamless interface with the integer unit, 
decoding and executing instructions in parallel with the integer unit. In 
comparison to the R4600, the floating point coprocessor of the R4700 has 
improved floating multiply operations. 

The R4600/R4700 uses the floating-point unit to perform integer 
multiply and divide, and results are placed in the HI and LO registers. The 
values can then be transferred to the general purpose register file using the 
MFHI/MFLO instructions. The R4700 performs an integer multiply faster 
than the R4600 by 2 clock cycles, but it takes the same number of clock 
cycles for integer division. The R4700 improves the multiply compared to 
the R4600 by performing a single-precision multiply in 4 clock cycles, and 
a double-precision multiply in 5 clock cycles. 

Figure 6. 1 illustrates the functional organization of the FPU. 
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Figure 6.1 FPU Functional Block Diagram 
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FPU Features 

This section briefly describes the operating model, the load/store 
instruction set, and the coprocessor interface in the FPU. A more detailed 
description is given in the sections that follow. 

• Full 64-bit Operation. When the FR bit in the CPU Status register 
equals 0, the FPU is configured for sixteen 64-bit registers for double- 
precision values or thirty-two 32-bit registers for single-precision val- 
ues. When the FR bit in the CPU Status register equals 1, the FPU is 
configured for thirty-two 64-bit registers. Each register can hold sin- 
gle- or double-precision values. The FPU also includes a 32-bit Con- 
trol/Status register that provides access to all IEEE-Standard 
exception handling capabilities. 

• Load and Store Instruction Set. Like the CPU, the FPU uses a load- 
and store-oriented instruction set, with single-cycle load and store 
operations. Overlap of multiply and add is supported. 

• lightly Coupled Coprocessor Interface. The FPU resides on-chip to 
form a tightly coupled unit with a seamless integration of floating- 
point and fixed-point instruction sets. 

FPU Programming Model 

This section describes the set of FPU registers and their data 
organization. The FPU registers include Floating-Point General Purpose 
registers (FGRs) and two control registers: Control/ Status and 
Implementation/ Revision 

Floating-Point General Registers (FGRs) 

The FPU has a set of Floating-Point General Purpose registers (FGRs) that 
can be accessed in the following ways: 

• As 32 general-purpose registers (32 FGRs), each of which is 32-bits 
wide when the FR bit in the CPU Status register equals 0; or as 32 gen- 
eral-purpose registers (32 FGRs), each of which is 64-bits wide when 
FR equals 1. The CPU accesses these registers through move, load, 
and store instructions. 

• As 16 floating-point registers (see the next section for a description of 
FPRs), each of which is 64-bits wide, when the FR bit in the CPU Sta- 
tus register equals 0. The FPRs hold values in either single- or double- 
precision floating-point format. Each FPR corresponds to adjacently 
numbered FGRs as shown in Figure 6.2 on page 6-3. 

• As 32 floating-point registers (see the next section for a description of 
FPRs), each of which is 64-bits wide, when the FR bit in the CPU Sta- 
tus register equals 1. The FPRs hold values in either single- or double- 
precision floating-point format. Each FPR corresponds to an FGR as 
shown in Figure 6.2. 
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Figure 6.2 FPU Registers 

Floating-Point Registers 

The FPU provides: 

• 16 Floating-Point registers (FPRs) for Status.FR = 0, or 

• 32 Floating-Point registers (FPRs) for Status.FR = 1. 

These 64-bit registers hold floating-point values during floating-point 
operations and are physically formed from the General Purpose registers 
(FGRs). When the FR bit in the Status register equals 1, the FPR references 
a single 64-bit FGR. 

The FPRs hold values in either single- or double-precision floating-point 
format. If the FR bit equals 0, only even numbers (the least register, as 
shown in Figure 6.2) can be used to address FPRs. When the FR bit is set 
to a 1, all FPR register numbers are valid. 

If the FR bit equals during a double-precision floating-point operation, 
the general registers are accessed in double pairs. Thus, in a double- 
precision operation, selecting Floating-Point Register (FPRO) actually 
addresses adjacent Floating-Point General Purpose registers FGRO and 
FGRJ. 

Floating-Point Control Registers 

The FPU has 32 control registers (FCRs) that can only be accessed by 
move operations. The FCRs are described below: 

• The Implementation/ Revision register (FCRO) holds revision informa- 
tion about the FPU. 

• The Control/ Status register (FCR31) controls and monitors excep- 
tions, holds the result of compare operations, and establishes round- 
ing modes. 

• FCR1 to FCR30 are reserved. 
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Table 6. 1 lists the assignments of the FCRs. 



FCR Number 


Use 


FCRO 


Coprocessor implementation and revision register 


FCR1 to FCR30 


Reserved 


FCR31 


Rounding mode, cause, trap enables, and flags 



Table 6.1 Floating-Point Control Register Assignments 

Implementation and Revision Register, (FCRO) 

The read-only Implementation and Revision register (FCRO) specifies the 
implementation and revision number of the FPU. This information can 
determine the coprocessor revision and performance level, and can also be 
used by diagnostic software. 

Figure 6.3 shows the layout of the register; Table 6.2, which follows the 
figure, describes the Implementation and Revision register (FCRO) fields. 
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Figure 6.3 Implementation/Revision Register 



Field 


Description 


Imp 


Implementation number R4600: 0x20 

R4700: 0x21 


Rev 


Revision number in the form of y.x 





Reserved. 



Table 6.2 FCRO Fields 



The revision number is a value of the form y.x, where: 

• y is a major revision number held in bits 7:4. 

• x is a minor revision number held in bits 3:0. 

The revision number distinguishes some chip revisions; however, there 
is no guarantee that changes to the chip are necessarily reflected by the 
revision number, or that changes to the revision number necessarily reflect 
real chip changes. For this reason revision number values are not listed, 
and software should not rely on the revision number to characterize the 
chip. 

Control/Status Register (FCR31) 

The Control/ Status register (FCR31) contains control and status 
information that can be accessed by instructions in either Kernel or User 
mode. FCR31 also controls the arithmetic rounding mode and enables 
User mode traps, as well as identifying any exceptions that may have 
occurred in the most recently executed instruction, along with any 
exceptions that may have occurred without being trapped. 
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Figure 6,4 on page 6-5 shows the format of the Control/ Status register, 
and Table 6.3, which follows the figure, describes the Control/ Status 
register fields. Figure 6.5 on page 6-5 shows the Control/ Status register 
Cause, Flag, and Enable fields. 
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Figure 6.4 FP Control/Status Register Bit Assignments 



Field 


Description 


FS 


When set, denormalized results are flushed to instead of causing 
an unimplemented operation exception. 


C 


Condition bit. See description of Control/ Status register Condition 
bit. 


Cause 


Cause bits. See Figure 6.5 and the description of Control/ Status 
register Cause, Flag, and Enable bits. 


Enables 


Enable bits. See Figure 6.5 and the description of Control/ Status 
register Cause, Flag, and Enable bits. 


Flags 


Flag bits. See Figure 6.5 and the description of Control/ Status reg- 
ister Cause, Flag, and Enable bits. 


RM 


Rounding mode bits. See Table 6.4 on page 7 and the description 
of Control/ Status register Rounding Mode Control bits. 



Table 6.3 Control/Status Register Field* 
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Figure 6.5 Control/Status Register Cause, Flag, and Enable Fields 
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Accessing the Control/Status Register 

When the Control/ Status register is read by a Move Control From 
Coprocessor 1 (CFC1) instruction, all unfinished instructions in the 
pipeline are completed before the contents of the register are moved to the 
main processor. If a floating-point exception occurs as the pipeline 
empties, the FP exception is taken and the CFC1 instruction is re-executed 
after the exception is serviced. 

The bits in the Control/ Status register can be set or cleared by writing to 
the register using a Move Control To Coprocessor 1 (CTC1) instruction. 
CTC1 is not issued until all previous floating-point operations are 
complete. 

IEEE Standard 754 

IEEE Standard 754 specifies that floating-point operations detect 
certain exceptional cases, raise flags, and can invoke an exception handler 
when an exception occurs. These features are implemented in the MIPS 
architecture with the Cause, Enable, and Flag fields of the Control/ Status 
register. The Flag bits implement IEEE 754 exception status flags, and the 
Cause and Enable bits implement exception handling. 

Control/Status Register FS Bit 

When the FS bit is set, denormalized results are flushed to instead of 
causing an unimplemented operation exception. 

Control/Status Register Condition Bit 

When a floating-point Compare operation takes place, the result is 
stored at bit 23, the Condition bit, to save or restore the state of the 
condition line. The C bit is set to 1 if the condition is true; the bit is cleared 
to if the condition is false. Bit 23 is affected only by compare and Move 
Control To FPU instructions. 

Control/Status Register Cause, Flag, and Enable Fields 

Figure 6.5 on page 6-5 illustrates the Cause, Flag, and Enable fields of 
the Control/ Status register. 

Cause Bits 

Bits 17:12 in the Control/ Status register contain Cause bits, as shown 
in Figure 6.5 on page 6-5, which reflect the results of the most recently 
executed instruction. The Cause bits are a logical extension of the CPO 
Cause register; they identify the exceptions raised by the last floating-point 
operation and raise an interrupt or exception if the corresponding enable 
bit is set. If more than one exception occurs on a single instruction, each 
appropriate bit is set. 

The Cause bits are written by each floating-point operation (but not by 
load, store, or move operations). The Unimplemented Operation (E) bit is 
set to a 1 if software emulation is required, otherwise it remains 0. The 
other bits are set to or 1 to indicate the occurrence or non-occurrence 
(respectively) of an IEEE 754 exception. 

When a floating-point exception is taken, no results are stored, and the 
only state affected is the Cause bits. Exceptions caused by an immediately 
previous floating-point operation can be determined by reading the Cause 
field. 

Enable Bits 

A floating-point operation that sets an enabled Cause bit forces an 
immediate exception, as does setting both Cause and Enable bits with 
CTC1. The floating-point exception or interrupt is enabled when the 
corresponding enable be is set. 

There is no enable for Unimplemented Operation (E). Setting 
Unimplemented Operation always generates a floating-point exception. 
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Before returning from a floating-point exception, or doing a CTC1, 
software must first clear the enabled Cause bits to prevent a repeat of the 
interrupt. Thus, User mode programs can never observe enabled Cause 
bits set; if this information is required in a User mode handler, it must be 
passed somewhere other than the Status register. 

For a floating-point operation that sets only unenabled Cause bits, no 
exception occurs and the default result defined by IEEE 754 is stored. In 
this case, the exceptions that were caused by the immediately previous 
floating-point operation can be determined by reading the Cause field. 

Flag Bits 

When an exception case is detected and the exception Enable is not set, 
the corresponding flag bit is set. If an exception is taken, none of the flag 
bits are modified. Note however that system software may set the flag bits 
before invoking a user exception handler. 

The Flag bits are cumulative and indicate that an exception was raised 
by an operation that was executed since they were explicitly reset. Flag bits 
are set to 1 if an IEEE 754 exception is raised, otherwise they remain 
unchanged. The Flag bits are never cleared as a side effect of floating-point 
operations; however, they can be set or cleared by writing a new value into 
the Status register, using a Move To Coprocessor Control instruction. 

Control/Status Register Rounding Mode Control Bits 

Bits 1 and in the Control/ Status register constitute the Rounding Mode 
(JRJW) field. 

As shown in Table 6.4, these bits specify the rounding mode that the 
FPU uses for all floating-point operations. 



Rounding 
Mode RM(1:0) 


Mnemonic 


Description 





RN 


Round result to nearest representable value; 
round to value with least-significant bit when 
the two nearest representable values are equally 
near. 


1 


RZ 


Round toward 0: round to value closest to and not 
greater in magnitude than the infinitely precise 
result. 


2 


RP 


Round toward +<»: round to value closest to and 
not less than the infinitely precise result. 


3 


RM 


Round toward - «~: round to value closest to and 
not greater than the infinitely precise result. 



Table 6.4 Rounding Mode Bit Decoding 

Floating-Point Formats 

The FPU performs both 32-bit (single-precision) and 64-bit (double- 
precision) IEEE standard floating-point operations. The 32-bit single- 
precision format has a 24-bit signed-magnitude fraction field [f+s) and an 
8-bit exponent (e), as shown in Figure 6.6. 





31 
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1 
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23 



Figure 6.6 Single-Precision Floating-Point Format 
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The 64-bit double-precision format has a 53-bit signed-magnitude 
fraction field (f+s) and an 1 1-bit exponent, as shown in Figure 6.7. 





63 62 52 51 
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Figure 6.7 Double-Precision Floating-Point Format 

As shown in the above figures, numbers in floating-point format are 
composed of three fields: 

• sign field, s 

• biased exponent, e = E + bias 

• fraction, / = bj b 2 — • ^p-i 

The range of the unbiased exponent E includes every integer between 
the two values E mln and E max inclusive, together with two other reserved 
values: 



• E r 



•1 (to encode ±0 and denormalized numbers) 



• E max +1 (to encode ±* and NaNs [Not a Number]) 

For single- and double-precision formats, each representable nonzero 
numerical value has just one encoding. 

For single- and double-precision formats, the value of a number, u, is 
determined by the equations shown in Table 6.5. 



No. 


Equation 


(1) 


if E = E max +1 and f * 0, then v is NaN, regardless of s 


(2) 


if E = E max +1 and f = 0, then v= (-1 ) s ~ 


(3) 


if E min SES E max , then v= (-1) s 2 E (1.r) 


(4) 


if E = E min -1 and f * 0, then v= (-1 ) s 2 Emin (0./) 


(5) 


if E = E miri -1 and f = 0, then v= (-1) s 



Table 6.5 Equations for Calculating Values in Single and 
Double-Precision Floating-Point Format 

For all floating-point formats, if v is NaN, the most-significant bit off 
determines whether the value is a signaling or quiet NaN: v is a signaling 
NaN if the most-significant bit off is set, otherwise, v is a quiet NaN. 
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defines the values for the format parameters. Minimum and maximum 
floating-point values are given in Table 6.7. 



Parameter 


Format 


Single 


Double 


f 


24 


53 


Emax 


+ 127 


+ 1023 


Emin 


-126 


-1022 


Exponent bias 


+ 127 


+1023 


Exponent width in bits 


8 


11 


Integer bit 


hidden 


hidden 


Fraction width in bits 


24 


53 


Format width in bits 


32 


64 



Table 6.6 Floating-Point Format Parameter Values 



Type 


Value 


Float Minimum 


1.40129846e-45 


Float Minimum Norm 


L17549435e-38 


Float Maximum 


3.40282347e+38 


Double Minimum 


4.9406564584 124654e-324 


Double Minimum Norm 


2.2250738585072014e-308 


Double Maximum 


1 .797693 1348623 1 57e+308 



Table 6.7 Minimum and Maximum Floating-Point Values 

Binary Fixed-Point Format 

Binary fixed-point values are held in 2's complement format. Unsigned 
fixed-point values are not directly provided by the floating-point 
instruction set. Figure 6.8 illustrates binary fixed-point format; Table 6.8, 
which follows the figure, lists the binary fixed-point format fields. 




Figure 6.8 Binary Fixed-Point Format 



Field 


Description 


sign 


sign bit 


integer 


integer value 



Table 6.8 Binary Fixed-Point Format Fields 
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Floating-Point Instruction Set Overview 

All FPU instructions are 32-bits long, aligned on a word boundary. They 
can be divided into the following groups: 

• Load, Store, and Move instructions move data between memory, the 
main processor, and the FPU General Purpose registers. 

• Conversion instructions perform conversion operations between the 
various data formats. 

• Computational instructions perform arithmetic operations on float- 
ing-point values in the FPU registers. 

• Compare instructions perform comparisons of the contents of regis- 
ters and set a conditional bit based on the results. 

• Branch on FPU Condition instructions perform a branch to the spec- 
ified target if the specified coprocessor condition is met. 

Table 6.9 through Table 6.12 list the instruction set of the FPU. A 
complete description of each instruction is provided in Appendix B. 

In the instruction formats shown in Table 6.9 through Table 6.12, the 
fint appended to the instruction opcode specifies the data format: s 
specifies single-precision binary floating-point, d specifies double- 
precision binary floating-point, and w specifies binary fixed-point. 



OpCode 


Description 


LWC1 


Load Word to FPU 


SWC1 


Store Word from FPU 


LDC1 


Load Doubleword to FPU 


SDC1 


Store Doubleword From FPU 


MTC1 


Move Word To FPU 


MFC1 


Move Word From FPU 


CTC1 


Move Control Word To FPU 


CFC1 


Move Control Word From FPU 


DMTC1 


Doubleword Move To FPU 


DMFC1 


Doubleword Move From FPU 



Table 6.9 FPU Instruction Summary: Load, Move and Store Instructions 



OpCode 


Description 


CVT.S.fmt 


Floating-point Convert to Single FP 


CVT.D.fmt 


Floating-point Convert to Double FP 


CVT.W.fmt 


Floating-point Convert to Single Fixed Point 


ROUND.w.fmt 


Floating-point Round 


TRUNC.w.fmt 


Floating-point Truncate 


CEIL.w.fmt 


Floating-point Ceiling 


FLOORw.fmt 


Floating-point Floor 



Table 6. 10 FPU Instruction Summary: Conversion Instructions 
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OpCodc 


Description 


ADD.fmt 


Floating-point Add 


SUB.fmt 


Floating-point Subtract 


MUL.fmt 


Floating-point Multiply 


DIV.fmt 


Floating-point Divide 


ABS.fmt 


Floating-point Absolute Value 


MOV.fmt 


Floating-point Move 


NEG.fmt 


Floating-point Negate 


SQRT.fmt 


Floating-point Square Root 



Table 6.11 FPU Instruction Summary: Computational Instructions 



OpCode 


Description 


C.cond.fmt 


Floating-point Compare 


BC1T 


Branch on FPU True 


BC1F 


Branch on FPU False 


BC1TL 


Branch on FPU True Likely 


BC1FL 


Branch on FPU False Likely 



Table 6.12 FPU Instruction Summary: Compare and Branch Instructions 

Floating-Point Load, Store, and Move Instructions 

This section discusses the manner in which the FPU uses the load, store 
and move instructions listed in Table 6.9 on page 10; Appendix B provides 
a detailed description of each instruction. 

Transfers Between FPU and Memory 

All data movement between the FPU and memory is accomplished by 
using one of the following instructions: 

• Load Word To Coprocessor 1 (LWC1) or Store Word To Coprocessor 1 
(SWC1) instructions, which reference a single 32-bit word of the FPU 
general registers 

• Load Doubleword (LDC1) or Store Doubleword (SDC1) instructions, 
which reference a 64-bit doubleword. 

These load and store operations are unformatted; no format conversions 
are performed and therefore no floating-point exceptions can occur due to 
these operations. 

With the LDC1 and SDC1 instructions the R4600/R4700 floating-point 
unit can take advantage of the 64-bit wide data cache and issue a 
coprocessor load or store double-word instruction with every cycle. 

Transfers Between FPU and CPU 

Data can also be moved directly between the FPU and the CPU by using 
one of the following instructions: 

• Move To Coprocessor 1 (MTC1) 

• Move From Coprocessor 1 (MFC1) 

• Doubleword Move To Coprocessor 1 (DMTC1) 

• Doubleword Move From Coprocessor 1 (DMFC1) 

Like the floating-point load and store operations, these operations 
perform no format conversions and never cause floating-point exceptions. 
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Load Delay and Hardware Interlocks 

The instruction immediately following a load can use the contents of the 
loaded register. In such cases the hardware interlocks, requiring 
additional real cycles; for this reason, scheduling load delay slots is 
desirable, although it is not required for functional code. 

Data Alignment 

All coprocessor loads and stores reference the following aligned data 
items: 

• For word loads and stores, the access type is always WORD, and the 
low-order 2 bits of the address must always be 0. 

• For doubleword loads and stores, the access type is always DOUBLE- 
WORD, and the low-order 3 bits of the address must always be 0. 

Endianness 

Regardless of byte-numbering order (endianness) of the data, the 
address specifies the byte that has the smallest byte address in the 
addressed field. For a big-endian system, it is the leftmost byte; for a little- 
endian system, it is the rightmost byte. 

Floating-Point Conversion Instructions 

Conversion instructions perform conversions between the various data 
formats such as single- or double-precision, fixed- or floating-point 
formats. Table 6.10 on page 10 lists conversion instructions; Appendix B 
gives a detailed description of each instruction. 

Floating-Point Computational Instructions 

Computational instructions perform arithmetic operations on floating- 
point values, in registers. Table 6. 1 1 on page 1 1 lists the computational 
instructions and Appendix B provides a detailed description of each 
instruction. There are two categories of computational instructions: 

• 3-Operand Register-Type instructions, which perform floating-point 
addition, subtraction, multiplication, division, and square root. 

• 2-Operand Register-Type instructions, which perform floating-point 
absolute value, move, and negate. 

Branch on FPU Condition Instructions 

Table 6.12 on page 11 lists the Branch on FPU (coprocessor unit 1) 
condition instructions that can test the result of the FPU compare (C.cond) 
instructions. Appendix B gives a detailed description of each instruction. 

Floating-Point Compare Operations 

The floating-point compare (C.fmt.cond) instructions interpret the 
contents of two FPU registers (fs, ft) in the specified format (/mt) and 
arithmetically compare them. A result is determined based on the 
comparison and conditions {condj specified in the instruction. 

Table 6.12 on page 1 1 lists the compare instructions; Appendix B gives 
a detailed description of each instruction. Table 6.13 on page 13 lists the 
mnemonics for the compare instruction conditions. 
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Mnemonic 


Definition 


Mnemonic 


Definition 


F 


False 


T 


True 


UN 


Unordered 


OR 


Ordered 


EQ 


Equal 


NEQ 


Not Equal 


UEQ 


Unordered or Equal 


OLG 


Ordered or Less Than or Greater Than 


OLT 


Ordered Less Than 


UGE 


Unordered or Greater Than or Equal 


ULT 


Unordered or Less Than 


OGE 


Ordered Greater Than 


OLE 


Ordered Less Than or Equal 


UGT 


Unordered or Greater Than 


ULE 


Unordered or Less Than or Equal 


OGT 


Ordered Greater Than 


SF 


Signaling False 


ST 


Signaling True 


NGLE 


Not Greater Than or Less Than or Equal 


GLE 


Greater Than, or Less Than or Equal 


SEQ 


Signaling Equal 


SNE 


Signaling Not Equal 


NGL 


Not Greater Than or Less Than 


GL 


Greater Than or Less Than 


LT 


Less Than 


NLT 


Not Less Than 


NGE 


Not Greater Than or Equal 


GE 


Greater Than or Equal 


LE 


Less Than or Equal 


NLE 


Not Less Than or Equal 


NGT 


Not Greater Than 


GT 


Greater Than 



Table 6.13 Mnemonics and Definitions of Compare Instruction Conditions 

FPU Instruction Pipeline Overview 

The FPU provides an instruction pipeline that parallels the CPU 
instruction pipeline. It shares the same five-stage pipeline architecture 
with the CPU (see Chapter 3). 

Instruction Execution 

Figure 6.9 illustrates the 5-stage FPU pipeline. This is the same as that 
of the integer pipeline but allows for the longer execution times of the 
floating-point instructions. 
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Figure 6.9 FPU Instruction Pipeline 
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Figure 6.9 on page 6-13 assumes that one instruction is completed 
every PCycle. Most FPU instructions, however, require more than one cycle 
in the EX stage. This means the FPU must stall the pipeline if an 
instruction execution cannot proceed because of register or resource 
conflicts. 

Floating-point operations proceed in parallel with non-floating-point 
operations. Floating-point operations are not allowed to overlap each 
other, with two exceptions: 

• An add operation may start 2 cycles after the start of a multiply and 
thus will be completely overlapped by the multiply. 

• A multiply operation may overlap for up to 2 cycles, as follows: 

R4600: A new multiply may start 6 cycles after another multiply. 
R4700: A new multiply may start 4 cycles after another multiply 
(for both single and double precision). 

Non-floating-point operations as well as other integer operations may be 
executed in parallel with the floating-point operations. All of this is 
handled automatically by internal hardware in the R4600/R4700. 

Instruction Execution Cycle Time 

Unlike the CPU, which executes almost all instructions in a single cycle, 
more time may be required to execute FPU instructions. 

Table 6.14 gives the minimum latency of each floating-point operation. 



Operation 


Pipeline Cycles 


Operation 


Pipeline Cycles 


Single 


Double 


Single 


Double 


ADD.fmt 


4 


4 


BC1T 


1 


SUB.fmt 


4 


4 


BC1F 


1 


MUL.fmt 

R4600 
R4700 


8 
4 


8 
5 


BC1TL 


1 


DIV.fmt 


32 


61 


BC1FL 


1 


SQRT.fmt 


31 


60 


LWC1, LDC1 


2 


ABS.fmt 


1 


1 


SWC1, SDC1 


1 


MOV.fmt 


1 


1 


TRUNC.W.fmt 


4 


4 


NEG.fmt 


1 


1 


MTC1, DMTC1 


2 


ROUND.Wfmt 


4 


4 


MFC1, DMFC1 


2 


CEIL.W.fmt 


4 


4 


CTC1 


3 


FLOOR.W.fmt 


4 


4 


CFC1 


2 


CVT.S.fmt 


(a) 


4 


CMP 


3 


3 


CVT.D.fmt 


2 


(a) 


FIX 


4 


4 


CVT.W.fmt 


4 


4 


FLOAT 


6 


6 


C.fmt.cond 


3 


3 









Note: ^ These operations are illegal. 

Table 6.14 Floating-Point Operation Latencies 
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Instruction Scheduling Constraints 

The FPU resource scheduler only issues instructions to the FPU op units 
(adder and multiplier) when no hardware use conflicts will occur. In 
addition, some overlap possibilities are disallowed to keep the scheduler 
simple (and/or increase performance). 

FPU Multiplier Constraints 

The FPU multiplier is partially pipelined in the R4600, allowing a new 
multiply to begin every 6 cycles. It is more fully pipelined in the R4700, 
allowing a new multiply to begin every 4 cycles. 

FPU Adder Constraints 

The FPU scheduler may issue an add operation (ADD.ftnt or SUB.fmt) 2 
cycles after a multiply (MUL.fmt). 

Resource Scheduling Rules 

The FPU Resource Scheduler issues instructions while adhering to the 
rules described below. These scheduling rules optimize op unit executions; 
if the rules are not followed, the hardware interlocks to guarantee correct 
operation. 

DIV.[S,D] can start only when all of the following conditions are met in 
the 1A phase. 

• The adder is idle (division is performed in the adder). 

• The multiplier is idle. 

MUL.[S,D] can start only when all of the following conditions are met in 
the 1A phase. 

• The multiplier is one of the following: 

- idle. 

- Started execution at least 6 cycles earlier on the current multiply 

• The adder is idle. 

SQRT.[S,D] can start when the following conditions are met in the 1A 
phase. 

• The adder is idle. 

• The multiplier must be idle. 

CVT.fmt instructions can only start when all of the following conditions 
are met in the 1 A phase. 

• The adder is idle. 

• The multiplier is idle. 

ADD*[S,D] or SUB.[S,D] can start only when all of the following 
conditions are met in the 1A phase. 

• The adder is idle 

• The multiplier is either: 

- idle. 

- started execution of the current multiply at least 2 cycles earlier. 

NEG.[S,D] or ABS.[S,D] can start only when all of the following 
conditions are met in the 1A phase. 

• The adder is idle. 

• The multiplier is idle. 

C.COND.[S,D] can start only when all of the following conditions are met 
in the 1 A phase. 

• The adder is idle. 

• The multiplier is idle. 
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This chapter describes FPU floating-point exceptions, including FPU 
exception types, exception trap processing, exception flags, saving and 
restoring state when handling an exception, and trap handlers for IEEE 
Standard 754 exceptions. 

A floating-point exception occurs whenever the FPU cannot handle 
either the operands or the results of a floating-point operation in its normal 
way. The FPU responds by generating an exception to initiate a software 
trap or by setting a status flag. 

Exception Types 

The FP Control/ Status register described in Chapter 6 contains an 
Enable bit for each exception type; exception Enable bits determine 
whether an exception will cause the FPU to initiate a trap or set a status 
flag. 

• If a trap is taken, the FPU remains in the state found at the beginning 
of the operation and a software exception handling routine executes. 

• If no trap is taken, an appropriate value is written into the FPU des- 
tination register and execution continues. 

The FPU supports the five IEEE Standard 754 exceptions: 

• Inexact (I) 

• Underflow (U) 

• Overflow (O) 

• Division by Zero (Z) 

• Invalid Operation (V) 

Cause bits, Enables, and Flag bits (status flags) are used. 

The FPU adds a sixth exception type, Unimplemented Operation (E). 
This exception indicates the use of a software implementation. The 
Unimplemented Operation exception has no Enable or Flag bit; whenever 
this exception occurs, an unimplemented exception trap is taken. 

Figure 7. 1 illustrates the Control/ Status register bits that support 
exceptions. 
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Each of the five IEEE Standard 754 exceptions (V, Z, O, U, I) is 
associated with a trap under user control, and is enabled by setting one of 
the five Enable bits. When an exception occurs and its corresponding 
Enable bit is not set, both the corresponding Cause and Flag bits are set. 
When an exception occurs and its corresponding Enable bit is set, the 
corresponding Cause bit is set and the subsequent exception processing 
allows a trap to be taken. 

Exception Trap Processing 

When a floating-point exception trap is taken, the Cause register 
indicates the floating-point coprocessor is the cause of the exception trap. 
The Floating-Point Exception (FPE) code is used, and the Cause bits of the 
floating-point Control/ Status register indicate the reason for the floating- 
point exception. These bits are, in effect, an extension of the system 
coprocessor Cause register. 

Flags 

A Flag bit is provided for each IEEE exception. This Flag bit is set to a 
1 on the assertion of its corresponding exception, with no corresponding 
exception trap signaled. 

The Flag bit is reset by writing a new value into the Status register; flags 
can be saved and restored by software either individually or as a group. 

When no exception trap is signaled, the floating-point coprocessor takes 
a default action, providing a substitute value for the exception-causing 
result of the floating-point operation. The particular default action taken 
depends upon the type of exception. Table 7.1 lists the default action 
taken by the FPU for each of the IEEE exceptions. 



Field 


Description 


Rounding 
Mode 


Default action 


I 


Inexact exception 


Any 


Supply a rounded result 


U 


Underflow exception 


Any 


Take unimplemented unless FCSR.FS bit is set. 


o 


Overflow exception 


RN 


Modify overflow values to °° with the sign of the 
intermediate result 


RZ 


Modify overflow values to the formats largest finite 
number with the sign of the intermediate result 


RP 


Modify negative overflows to the formats most nega- 
tive finite number; modify positive overflows to + <*> 


RM 


Modify positive overflows to the format's largest 
finite number; modify negative overflows to - <» 


z 


Division by zero 


Any 


Supply a properly signed °o 


V 


Invalid operation 


Any 


Supply a quiet Not a Number (NaN) 



Table 7.1 Default FPU Exception Actions 

The FPU detects the eight exception causes internally. When the FPU 
encounters one of these unusual situations, it causes either an IEEE 
exception or an Unimplemented Operation exception (E). 
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lists the exception-causing conditions of the IEEE Standard 754. 


PPA Internal 
Result 


IEEE 
Standard 754 


Trap 
Enable 


Trap 
Disable 


Notes 


Inexact result 


I 


1 


1 


Loss of accuracy 


Exponent overflow 


o,i a 


o,i 


o,i 


Normalized exponent > E^ax 


Division by zero 


Z 


z 


Z 


Zero is (exponent = Em^-l, mantissa = 0) 


Overflow on convert 


V 


E 


E 


Source out of integer range 


Signaling NaN source 


V 


V 


V 


Signaling NaN source produces quiet NaN result 


Invalid operation 


V 


V 


V 


0/0, etc. 


Exponent underflow 


u 


E 


E 


Normalized exponent < E^ 


Denormalized source 


None 


E 


E 


Exponent = E-l and mantissa <> 


Note: a The IEEE Standard 754 specifies an inexact exception on overflow only if the overflow trap is disabled. 



Table 7.2 FPU Exception-Causing Conditions 

FPU Exceptions 

The following sections describe the conditions that cause the FPU to 
generate each of its exceptions, and details the FPU response to each 
exception-causing condition. 

Inexact Exception (I) 

The FPU generates the Inexact exception if the rounded result of an 
operation is not exact or if it overflows. The FPU usually examines the 
operands of floating-point operations before execution actually begins, to 
determine (based on the exponent values of the operands) if the operation 
can possibly cause an exception. If there is a possibility of an instruction 
causing an exception trap, the FPU uses a coprocessor stall to execute the 
instruction. 

It is impossible, however, for the FPU to predetermine if an instruction 
will produce an inexact result. If Inexact exception traps are enabled, the 
FPU uses the coprocessor stall mechanism to execute all floating-point 
operations that require more than two cycles. Since this mode of execution 
can impact performance, Inexact exception traps should be enabled only 
when necessary. 

Trap Enabled Results: If Inexact exception traps are enabled, the result 
register is not modified and the source registers are preserved. 

Trap Disabled Results: The rounded or overflowed result is delivered to 
the destination register if no other software trap occurs. 

Invalid Operation Exception (V) 

The Invalid Operation exception is signaled if one or both of the 
operands are invalid for an implemented operation. When the exception 
occurs without a trap, the MIPS ISA defines the result as a quiet Not a 
Number (NaN). The invalid operations are: 

• Addition or subtraction: magnitude subtraction of infinities, such as: 
(+ °°) + (- °°) or (- oo) - (- oo) 

• Multiplication: times <*>, with any signs 

• Division: 0/0, or <*>/°°, with any signs 

• Comparison of predicates involving < or > without?, when the oper- 
ands are unordered 

• Any arithmetic operation on a signaling NaN. Amove (MOV) operation 
is not considered to be an arithmetic operation, but absolute value 
(ABS) and negate (NEG) are considered to be arithmetic operations 
and cause this exception if one or both operands is a signaling NaN. 

• Square root: Vx, where x is less than zero 
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Software can simulate the Invalid Operation exception for other 
operations that are invalid for the given source operands. Examples of 
these operations include IEEE Standard 754-specified functions 
implemented in software, such as Remainder: x REM y, where y is or x 
is infinite; conversion of a floating-point number to a decimal format whose 
value causes an overflow, is infinity, or is NaN; and transcendental 
functions, such as In (-5) or cos- 1(3). Refer to Appendix B for examples or 
for routines to handle these cases. 

Trap Enabled Results: The original operand values are undisturbed. 

Trap Disabled Results: The FPU sets the Invalid Operation Exception 
flag and a quiet NaN is delivered to the destination register. 

Division-by-Zero Exception (Z) 

The Division-by-Zero exception is signaled on an implemented divide 
operation if the divisor is zero and the dividend is a finite nonzero number. 
Software can simulate this exception for other operations that produce a 
signed infinity, such as ln(0), sec(7i/2), csc(O), or 0" 1 * 

Trap Enabled Results: The result register is not modified, and the 
source registers are preserved. 

Trap Disabled Results: The result, when no trap occurs, is a correctly 
signed infinity. 

Overflow Exception (O) 

The Overflow exception is signaled when the magnitude of the rounded 
floating-point result, with an unbounded exponent range, is larger than 
the largest finite number of the destination format. (This exception also 
sets the Inexact exception and Flag bits.) 

Trap Enabled Results: The result register is not modified, and the 
source registers are preserved. 

Trap Disabled Results: The result, when no trap occurs, is determined 
by the rounding mode and the sign of the intermediate result. 

Underflow Exception (U) 

Two related events contribute to the Underflow exception: 

• creation of a tiny nonzero result between ±2 Emm which can cause 
some later exception because it is so tiny 

• extraordinary loss of accuracy during the approximation of such tiny 
numbers by denormalized numbers. 

IEEE Standard 754 allows a variety of ways to detect these events, but 
requires they be detected the same way for all operations. 
Tinniness can be detected by one of the following methods: 

• after rounding (when a nonzero result, computed as though the expo- 
nent range were unbounded, would lie strictly between ±2 Emm ) 

• before rounding (when a nonzero result, computed as though the ex- 
ponent range and the precision were unbounded, would lie strictly be- 
tween ±2 Emin ). 

The MIPS architecture requires that tininess be detected after rounding. 
Loss of accuracy can be detected by one of the following methods: 

• denormalization loss (when the delivered result differs from what 
would have been computed if the exponent range were unbounded) 

• inexact result (when the delivered result differs from what would have 
been computed if the exponent range and precision were both un- 
bounded). 

The MIPS architecture requires that loss of accuracy be detected as an 
inexact result. 

Trap Enabled Results: When an underflow trap is enabled, underflow 
is signaled when tininess is detected regardless of loss of accuracy. If 
underflow traps are enabled, the result register is not modified, and the 
source registers are preserved. 



7-4 



Floating-Point Exceptions Chapter 7 

Trap Disabled Results: When an underflow trap is not enabled and 
FCSRFS is clear, then take an unimplemented exception. When an 
underflow trap is not enabled and FCSRFS is set, raise Inexact and return 
either or ±2 Bmin , as appropriate for the current rounding mode. 

Unimplemented Instruction Exception (E) 

Any attempt to execute an instruction with an operation code or format 
code that has been reserved for future definition sets the Unimplemented 
bit in the Cause field in the FPU Control/ Status register and traps. The 
operand and destination registers remain undisturbed and the instruction 
is emulated in software. Any of the IEEE Standard 754 exceptions can 
arise from the emulated operation, and these exceptions in turn are 
simulated. 

The Unimplemented Instruction exception can also be signaled when 
unusual operands or result conditions are detected that the implemented 
hardware cannot handle properly. These include: 

• Denormalized operand 

• Quiet NaN operand 

• Underflow 

• Reserved opcodes 

• Unimplemented formats 

• Conversion of a floating-point number to a fixed point format when an 
overflow occurs or the source operand value is Infinity or a NaN. 

• Operations which are invalid for their format (for instance, CVT.S.S) 
Denormalized and NaN operands are only trapped if the instruction is a 

convert or computational operation. Moves and compares do not trap if 
their operands are either denormalized or NaNs. 

The use of this exception for such conditions is optional; most of these 
conditions are newly developed and are not expected to be widely used in 
early implementations. Loopholes are provided in the architecture so that 
these conditions can be implemented with assistance provided by 
software, maintaining full compatibility with the IEEE Standard 754. 

Trap Enabled Results: The original operand values are undisturbed. 

Trap Disabled Results: This trap cannot be disabled. 

Saving and Restoring State 

Sixteen or thirty-two doubleword coprocessor load or store operations 
save or restore the coprocessor floating-point register state in memory. 
The remainder of control and status information can be saved or restored 
through Move To/From Coprocessor Control Register instructions, and 
saving and restoring the processor registers. Normally, the Control/ Status 
register is saved first and restored last. 

When the coprocessor Control/ Status register {FCR31) is read, and the 
coprocessor is executing one or more floating-point instructions, the 
instruction(s) in progress are either completed or reported as exceptions. 
The architecture requires that no more than one of these pending 
instructions can cause an exception. Information indicating the type of 
exception is placed in the Control/ Status register. When state is restored, 
state information in the status word indicates that exceptions are pending. 

Writing a zero value to the Cause field of Control/ Status register clears 
all pending exceptions, permitting normal processing to restart after the 
floating-point register state is restored. 

The Cause field of the Control/ Status register holds the results of only 
one instruction; the FPU examines source operands before an operation is 
initiated to determine if this instruction can possibly cause an exception. 
If an exception is possible, the FPU executes the instruction in stall mode 
to ensure that no more than one instruction (that might cause an 
exception) is executed at a time. 
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Trap Handlers for IEEE Standard 754 Exceptions 

The IEEE Standard 754 strongly recommends that users be allowed to 
specify a trap handler for any of the five standard exceptions that can 
compute; the trap handler can either compute or specify a substitute 
result to be placed in the destination register of the operation. 

By retrieving an instruction using the processor Exception Program 
Counter (EPQ register, the trap handler determines: 

• exceptions occurring during the operation 

• the operation being performed 

• the destination format 

On Overflow or Underflow exceptions (except for conversions), and on 
Inexact exceptions, the trap handler gains access to the correctly rounded 
result by examining source registers and simulating the operation in 
software. 

On Overflow or Underflow exceptions encountered on floating-point 
conversions, and on Invalid Operation and Divide-by-Zero exceptions, the 
trap handler gains access to the operand values by examining the source 
registers of the instruction. 

The IEEE Standard 754 recommends that, if enabled, the overflow and 
underflow traps take precedence over a separate inexact trap. This 
prioritization is accomplished in software; hardware sets the bits for both 
the Inexact exception and the Overflow or Underflow exception. 
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Introduction 

This chapter describes the signals used by and in conjunction with the 
R4600/R4700 processor. The signals include the System interface, the 
Clock/Control interface, the Interrupt interface, the Joint Test Action 
Group (JTAG) interface, and the Initialization interface. 

Signals are listed in bold, and low active signals have a trailing asterisk 
— for instance, the low-active Read Ready signal is RdRdy*. The signal 
description also tells if the signal is an input (the processor receives it) or 
output (the processor sends it out). 

Figure 8. 1 illustrates the functional groupings of the processor signals. 
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Figure 8. 1 R4600/R4700 Processor Signals 
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System Interface Signals 

System Interface signals provide the connection between the R4600/ 
R4700 processor and the other components in the system. Table 8. 1 lists 
the system interface signals. 



Name 


Definition 


Direction 


Description 


ExtRqst* 


External request 


Input 


An external agent asserts ExtRqst* to 
request use of the System interface. The pro- 
cessor grants the request by asserting 
Release*. 


Release* 


Release interface 


Output 


In response to the assertion of ExtRqst* or a 
CPU read request, the processor asserts 
Release*, signalling to the requesting device 
that the System interface is available. 


RdRdy* 


Read ready 


Input 


The external agent asserts RdRdy* to indi- 
cate that it can accept a processor read 
request. 


SysAD(63:0) 


System address/ 
data bus 


Input/ 
Output 


A 64-bit address and data bus for communi- 
cation between the processor and an external 
agent. 


SysADC(7:0) 


System address/ 
data check bus 


Input/ 
Output 


An 8-bit bus containing check bits for the 
SysAD bus. 


SysCmd(8:0) 


System com- 
mand/data identi- 
fier 


Input/ 
Output 


A 9-bit bus for command and data identifier 
transmission between the processor and an 
external agent. 


SysCmdP 


System com- 
mand/data identi- 
fier bus parity 


Input/ 
Output 


A single, even-parity bit for the SysCmd bus. 


Validln* 


Valid input 


Input 


The external agent asserts Validln* when it 
is driving a valid address or data on the 
SysAD bus and a valid command or data 
identifier on the SysCmd bus. 


ValidOut* 


Valid output 


Output 


The processor asserts ValidOut* when it is 
driving a valid address or data on the SysAD 
bus and a valid command or data identifier 
on the SysCmd bus. 


WrRdy* 


Write ready 


Input 


An external agent asserts WrRdy* when it 
can accept a processor write request. 



Table 8.1 System Interface Signals 
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Clock/Control Interface Signals 

The Clock/Control interface signals make up the interface for clocking 
and maintenance. 

Table 8.2 lists the Clock/Control interface signals. 



Name 


Definition 


Direction 


Description 


IOOut 


I/O output 


Output 


Reserved for future output. 
Always High. 


IOIn 


I/O input 


Input 


Reserved for future input. 
Should be driven High. 


MasterClock 


Master clock 


Input 


Master clock input that estab- 
lishes the processor operating 
frequency. It is 1/2 the pipeline 
frequency. 


MasterOut 


Master clock out 


Output 


Master clock output aligned with 
MasterClock. 


RClock(lrO) 


Receive clocks 


Output 


Two identical receive clocks that 
establish the System interface 
frequency. 


SyncOut 


Synchronization 
clock out 


Output 


SyncOut must be connected to 
Syncln through an interconnect 
that models the interconnect 
between MasterOut, TClock, 
RClock, and the external agent. 


Syncln 


Synchronization 
clock in 


Input 


Synchronization clock input. 


TClock(l:0) 


Transmit clocks 


Output 


Two identical transmit clocks 
that establish the System inter- 
face frequency. 


Fault* 


Fault 


Output 


Reserved for future output. 
Always High. 


VccP 


Quiet Vcc for PLL 


Input 


Quiet Vcc for the internal phase 
locked loop. 


VssP 


Quiet Vss for PLL 


Input 


Quiet Vss for the internal phase 
locked loop. 



Table 8.2 Clock/Control Interface Signals 
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Interrupt Interface Signals 

The Interrupt interface signals make up the interface used by external 
agents to interrupt the R4600/R4700 processor. Six hardware interrupts 
(Int*(5:0)) and one NMI are available on the R4600/R4700. Table 8.3 lists 
the Interrupt interface signals. 



Name 


Definition 


Direction 


Description 


Int*(5:0) 


Interrupt 


Input 


Six general processor interrupts, bit-wise ORed 
with bits 5:0 of the interrupt register. 


NMI* 


Nonmaskable 
interrupt 


Input 


Nonmaskable interrupt, ORed with bit 6 of the 
interrupt register. 



Table 8.3 Interrupt Interface Signals 

JTAG Interface Signals 

The R4600/R4700 does not implement JTAG. The signals are provided 
for compatibility with existing R4x00PC designs. 
Table 8.4 lists the JTAG interface signals. 



Name 


Definition 


Direction 


Description 


JTDI 


JTAG data in 


Input 


Connected directly to JTDO. No JTAG imple- 
mented. Should be pulled High. 


JTCK 


TAG clock input 


Input 


Unused input. Should be pulled High. 


JTDO 


JTAG data out 


Output 


Connected directly to JTDI. If no external 
scan used, this is a no connect. 


JTMS 


JTAG command 


Input 


Unused input. Should be pulled High. 



Table 8.4 JTAG Interface Signals 
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Initialization Interface Signals 

The Initialization interface signals make up the interface by which an 
external agent initializes the processor operating parameters. Table 8.5 
lists the Initialization interface signals. 



Name 


Definition 


Direction 


Description 


ColdReset* 


Cold reset 


Input 


This signal must be asserted for a 
power on reset or a cold reset The 
clocks SClock, TClock, and 
RClock begin to cycle and are syn- 
chronized with the deasserted edge 
of ColdReset*. ColdReset* must 
be deasserted synchronously with 
MasterClock. 


ModeClock 


Boot mode clock 


Output 


Serial boot-mode data clock output; 
runs at the Master Clock frequency 
divided by 256: (MasterClock/ 

256). 


Modeln 


Boot mode data in 


Input 


Serial boot-mode data input. 


Reset* 


Reset 


Input 


This signal must be asserted for any 
reset sequence. It can be asserted 
synchronously or asynchronously 
for a cold reset, or synchronously to 
initiate a warm reset. Reset* must 
be deasserted synchronously with 
MasterClock. 


VCCOk 


Vcc is OK 


Input 


When asserted, this signal indicates 
to the processor that V cc > V cc min 
for more than 100 milliseconds and 
will remain stable. The assertion of 
VCCOk initiates the initialization 
sequence. 



Table 8.5 Initialization Interface Signals 
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Table 8.6 lists the R4600/R4700 processor signals and their possible 
states. 



Description 


Name 


I/O 


Asserted 
State 


3-State 


Reset 
State 


System address /data bus 


SysAD(63:0) 


I/O 


High 


Yes 


a 


System address /data check bus 


SysADC(7:0) 


I/O 


High 


Yes 


a 


System command/data identifier bus 


SysCmd(8:0) 


I/O 


High 


Yes 


a 


System command/data identifier bus parity 


SysCmdP 


I/O 


High 


Yes 


a 


Valid input 


Validln* 


I 


Low 


No 


NA 


Valid output 


ValidOut* 


o 


Low 


Yes 


b 


External request 


ExtRqst* 


I 


Low 


No 


NA 


Release interface 


Release* 


o 


Low 


Yes 


b 


Read ready 


RdRdy* 


I 


Low 


No 


NA 


Write ready 


WrRdy* 


I 


Low 


No 


NA 


Interrupts 


Int*(5:0) 


I 


Low 


No 


NA 


Nonmaskable interrupt 


NMI* 


I 


Low 


No 


NA 


Boot mode data in 


Modeln 


I 


High 


No 


NA 


Boot mode clock 


ModeClock 


o 


High 


No 


d 


JTAG data in 


JTDI 


I 


High 


No 


NA 


JTAG data out 


JTDO 


o 


High 


Yes 


b 


JTAG command 


JTMS 


I 


High 


No 


NA 


JTAG clock input 


JTCK 


I 


High 


No 


NA 


Transmit clocks 


TClock(l:0) 


o 


High 


Yes 


c 


Receive clocks 


RClock(l:0) 


o 


High 


Yes 


c 


Master clock 


MasterClock 


I 


High 


No 


NA 


Master clock out 


MasterOut 


o 


High 


Yes 


c 


Synchronization clock out 


SyncOut 


o 


High 


Yes 


c 


Synchronization clock in 


Syncln 


I 


High 


No 


NA 


I/O output 


IOOut 


o 


High 


Yes 


b 


I/O input 


IOIn 


I 


High 


No 


NA 


Vcc is OK 


VCCOk 


I 


High 


No 


NA 


Cold reset 


ColdReset* 


I 


Low 


No 


NA 


Reset 


Reset* 


I 


Low 


No 


NA 


Fault 


Fault* 


o 


Low 


Yes 


b 


Key to Reset State Column: 

a All I/O pins (SysAD[63:0J, SysADC[7:0J, etc.) remain 3-stated until the Reset* signal deasserts. 

b All output only pins (ValidOut*, Release*, etc.), except the clocks, are 3-stated until the ColdReset* 

signal deasserts. 
c All clocks, except ModeClock, are 3-stated until VCCOk asserts, 
d ModeClock is always driven. 
NA Not applicable to input pins. 



Table 8.6 R4600/R4700 Processor Signal Summary 
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Introduction 

This chapter describes the R4600/R4700 Initialization interface. This 
includes the reset signal description and types, initialization sequence, 
with signals and timing dependencies, and boot modes, which are set at 
initialization time. 

Signal names are listed in bold letters — for instance the signal VCCOk 
indicates the Vcc voltage is stable. Low-active signals are indicated by an 
asterisk at the end of the name, as in ColdReset*. 

Functional Overview 

The R4600/R4700 processor has the following three types of resets. 
Refer to Figure 9.1 on page 9-4, Figure 9.2 on page 9-5, and Figure 9.3 on 
page 9-6 for timing diagrams of these resets. 

• Power-on reset: Starts when the power supply is turned on and 

completely reinitializes the internal state machine of 
the processor without saving any state information. 

• Cold reset: Restarts all clocks, but the power supply remains 

stable. A cold reset completely reinitializes the 
internal state machine of the processor without 
saving any state information. 

• Warm reset: Restarts processor, but does not affect clocks. A 

warm reset preserves the processor internal state. 

These resets use the VCCOk, ColdReset*, and Reset* input signals, 
which are summarized in the next subsection. Descriptions of each type 
of reset operation is described 

The Initialization interface is a serial interface that operates at the 
frequency of the MasterClock divided by 256 (i.e. MasterClock/256). This 
low-frequency operation allows the initialization information to be stored 
in a low-cost EPROM or PLD. 

Reset and Initialization Signal Descriptions 

This section describes the three reset signals, VCCOk, ColdReset*, and 
Reset*, and the two initialization signals, Modeln and ModeClock. 

VCCOk: When asserted 1 , VCCOk indicates to the processor that the 5.0 
(3.3) volt power supply (Vcc) has been above 4.75 (3.0) volts for 
more than 100 milliseconds (ms) and is expected to remain 
stable. The assertion of VCCOk initiates the reading of the 
boot-time mode control serial stream. This is described in the 
subsection "Initialization Sequence** on page 9-4. 

ColdReset*: The ColdReset* signal must be asserted (low) for either a 
power-on reset or a cold reset. The clocks SClock, TClock, and 
RClock begin to cycle and are synchronized with the 
de-asserted edge (high) of ColdReset*. ColdReset* must be 
de-asserted synchronously with MasterClock. 
Reset*: The Reset* signal must be asserted for any reset sequence. It 
can be asserted synchronously or asynchronously for a cold 
reset, or synchronously to initiate a warm reset. Reset* must 
be de-asserted synchronously with MasterClock. 
Modeln: Serial boot mode data in. 

ModeClock: Serial boot mode data out, at the MasterClock frequency 
divided by 256 (MasterClock/256). 



*• Asserted means the signal is true, or in its valid state. For example, the low- 
active Reset* signal is said to be asserted when it is in a low (true) state; the 
high-active VCCOk signal is true when it is asserted high. 
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Table 9. 1 lists the processor signals and their possible states. 



Description 


Name 


I/O 


Asserted State 


3-State 


Reset State 


System address/data bus 


SysAD(63:0) 


I/O 


High 


Yes 


a 


System address/data check bus 


SysADC(7:0) 


I/O 


High 


Yes 


a 


System command/data identifier bus 


SysCmd(8:0) 


I/O 


High 


Yes 


a 


System command /data identifier bus parity 


SysCmdP 


I/O 


High 


Yes 


a 


Valid input 


Validln* 


I 


Low 


No 


NA 


Valid output 


ValidOut* 


o 


Low 


Yes 


b 


External request 


ExtRqst* 


I 


Low 


No 


NA 


Release interface 


Release* 


o 


Low 


Yes 


b 


Read ready 


RdRdy* 


I 


Low 


No 


NA 


Write ready 


WrRdy* 


I 


Low 


No 


NA 


Interrupts 


Int*(5:0) 


I 


Low 


No 


NA 


Nonmaskable interrupt 


NMI* 


I 


Low 


No 


NA 


Boot mode data in 


Modeln 


I 


High 


No 


NA 


Boot mode clock 


ModeClock 


o 


High 


No 


d 


JTAG data in 


JTDI 


I 


High 


No 


NA 


JTAG data out 


JTDO 


o 


High 


Yes 


b 


JTAG command 


JTMS 


I 


High 


No 


NA 


JTAG clock input 


JTCK 


I 


High 


No 


NA 


Transmit clocks 


TClock(l:0) 


o 


High 


Yes 


c 


Receive clocks 


RClock(lrO) 


o 


High 


Yes 


c 


Master clock 


MasterClock 


I 


High 


No 


NA 


Master clock out 


MasterOut 


o 


High 


Yes 


c 


Synchronization clock out 


SyncOut 


o 


High 


Yes 


c 


Synchronization clock in 


Syncln 


I 


High 


No 


NA 


I/O output 


IOOut 


o 


High 


Yes 


b 


I/O input 


IOIn 


I 


High 


No 


NA 


Vcc is OK 


VCCOk 


I 


High 


No 


NA 


Cold reset 


ColdReset* 


I 


Low 


No 


NA 


Reset 


Reset* 


I 


Low 


No 


NA 


Fault 


Fault* 


o 


Low 


Yes 


b 


Key to Reset State Column: 

a All I/O pins (SysAD[63:01, SysADC[7:0J, etc.) remain 3-stated until the Reset* signal deasserts. 

b All output only pins (ValidOut*, Release*, etc.), except the clocks, are 3-stated until the ColdReset* signal 

deasserts. 
c All clocks, except ModeClock, are 3-stated until VCCOk asserts, 
d ModeClock is always driven. 
NA Not applicable to input pins. 



Table 9.1 R4600/R4700 Processor Signal Summary 
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Power-on Reset 

Figure 9.1, Figure 9.2, and Figure 9.3 illustrate the power-on, warm, 
and cold resets. 

This is the sequence for a power-on reset: 

1 . Power-on reset applies a stable Vcc of at least 4. 5 (3.0) volts from the 
5.0 (3.3) volt power supply to the processor. During this time, VCCOk is 
deasserted, ColdReset* and Reset* are asserted and the MasterClock 
input oscillates. 

2. After at least 100 ms of stable Vcc and MasterClock, the VCCOk 
signal is asserted to the processor. The assertion of VCCOk begins the 
initialization of the processor. After the mode bits have been read in, the 
processor allows its internal phase locked loops to lock, stabilizing the 
processor internal clock, PClock, the SyncOut-SyncIn clock path 
(described in Chapter 10), and the master clock output, MasterOut. 

3. ColdReset* is asserted for at least 64K (or 2 16 ) MasterClock cycles 
after the assertion of VCCOk. Once the processor reads the boot-time 
mode control serial data stream, ColdReset* can be deasserted. 
ColdReset* must be deasserted synchronously with MasterClock. 

4. The deasserted edge of ColdReset* synchronizes the edges of SClock, 
TClock, and RClock (to all processors, if in a multiprocessor system). 

5. After ColdReset* is deasserted synchronously and SClock, TClock, 
and RClock have stabilized, Reset* is deasserted to allow the processor to 
begin running. (Reset* must be held asserted for at least 64 MasterClock 
cycles after the deassertion of ColdReset*.) Reset* must be deasserted 
synchronously with MasterClock. 

Note: ColdReset* must be asserted when VCCOk asserts. The behavior of the 
processor is undefined if VCCOk asserts while ColdReset* is deasserted. 

Cold Reset 

A cold reset can begin anytime after the processor has read the 
initialization data stream, causing the processor to start with the Reset 
exception. 

A cold reset requires the same sequence as a power-on reset except that 
the power is presumed to be stable before the assertion of the reset inputs 
and the deassertion of VCCOk. 

To begin the reset sequence, VCCOk must be deasserted for a minimum 
of 100 ms before reassertion. 

Warm Reset 

To execute a warm reset, the Reset* input is asserted synchronously 
with MasterClock. It is then held asserted for at least 64 MasterClock 
cycles before being deasserted synchronously with MasterClock. The 
processor internal clocks, PClock and SClock, and the System interface 
clocks, TClock and RClock, are not affected by a warm reset. The boot- 
time mode control serial data stream is not read by the processor on a 
warm reset. A warm reset forces the processor to start with a Soft Reset 
exception. 

The master clock output, MasterOut, generates any reset-related 
signals for the processor that must be synchronous with MasterClock. 

After a power-on reset, cold reset, or warm reset, all processor internal 
state machines are reset, and the processor begins execution at the reset 
vector. All processor internal states are preserved during a warm reset, 
although the precise state of the caches depends on whether or not a cache 
miss sequence has been interrupted by resetting the processor state 
machines. 
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Initialization Sequence 

The boot-mode initialization sequence begins immediately after VCCOk 
is asserted. As the processor reads the serial stream of 256 bits through 
the Modeln pin, the boot-mode bits initialize all fundamental processor 
modes. (The signals used are described in Chapter 8). 

This is the initialization sequence: 

1. The system deasserts the VCCOk signal. The ModeClock output 
is held asserted. 

2. The processor synchronizes the ModeClock output at the time 
VCCOk is asserted. The first rising edge of ModeClock occurs at least 256 
MasterClock cycles after VCCOk is asserted. There could be more clock 
cycles due to internal delays on the VccOK signal. After the first rising 
edge, each additional rising edge will be 256 master clock cycles. 

3. Each bit of the initialization stream is presented at the Modeln pin 
after each rising edge of the ModeClock. The processor samples 256 
initialization bits from the Modeln input. 
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Figure 9.1 Power-on Reset 
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Figure 9.2 Cold Reset 
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Figure 9.3 Warm Reset 

Boot-Mode Settings 

Unlike the R4000, the speed of the R4600/R4700 output drivers is 
statically controlled at boot time. 

Table 9.2 lists the processor boot-mode settings. The following rules 
apply to the boot-mode settings listed in the table: 

• Bit of the stream is presented to the processor when VCCOk 
is first asserted. 

• Selecting a reserved value results in undefined processor behav- 
ior. 

• Bits 19 to 255 are reserved bits. 

• Zeros must be scanned in for all reserved bits. 
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Serial 
Bit 


Value 


Mode Setting 


Serial 
Bit 


Value 


Mode Setting 





Reserved (must be zero) 




9:10 


Non-block Write: Selects the manner in 
which non-block writes are handled, bit 10 
is most significant 


1:4 


XmitDatPat: System interface data rate for 
block writes only (bit 4 most significant) 





R4x00 compatible 





DDDD 


1 


Reserved 


1 


DDxDDx 


2 


Pipelined Writes 


2 


DDxxDDxx 


3 


Write re-issue 


3 


DxDxDxDx 


11 


TmrlntEn: Disables the timer interrupt on 
Int*[5] 


4 


DDxxxDDxxx 





Enabled Timer Interrupt 


5 


DDxxxxDDxxxx 


1 


Disabled Timer Interrupt 


6 


DxxDxxDxxDxx 


12 


Reserved (must be zero) 


7 


DDxxxxxxDDxxxxxx 


13:14 


Drv_Out: Output driver slew rate control. 
Bit 14 is most significant. Affects only out- 
puts that are not clocks. 


8 


DxxxDxxxDxxxDxxx 


10 


100% strength (fastest) 


9-15 


Reserved 


11 


83% strength 


5:7 


SysCkRatio: PClock to SClock divisor, fre- 
quency relationship between SClock, RClock, 
and TClock and PClock, bit 7 most significant. 


00 


67% strength 





Divide by 2 


01 


50% strength (slowest) 


1 


Divide by 3 


15 


Tclock[0J: 


2 


Divide by 4 


[01 Enabled. [11 Disabled. 


3 


Divide by 5 


16 


Tclock[l]: 


4 


Divide by 6 


[01 Enabled. [1] Disabled. 


5 


Divide by 7 


17 


Rclock[0]: 


6 


Divide by 8 


[01 Enabled. [11 Disabled. 


7 


Reserved 


18 


Rclock[l]: 


8 


EndBIt: Specifies byte ordering 


[01 Enabled. [1] Disabled. 





Little-endian 
ordering 


19:255 


Reserved (must be zero) 


1 


Big-endian 
ordering 







Table 9.2 Boot-Mode Settings 
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Introduction 

This chapter describes the clock signals ("clocks") used in the R4600/ 
R4700 processor and the processor status reporting mechanism. 

The subject matter includes basic system clocks, system timing 
parameters, connecting clocks to a phase-locked system, connecting 
clocks to a system without phase locking, and processor status outputs. 

Signal Terminology 

The following terminology is used in this chapter (and book) when 
describing signals: 

• Rising edge indicates a low-to-high transition. 

• Falling edge indicates a high-to-low transition. 

• Clock-to-Q delay is the amount of time it takes for a signal to move 
from the input of a device {clock} to the output of the device (Q). 

Figure 10. 1 and Figure 10.2 illustrate these terms. 
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Figure 10.1 Signal Transitions 
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Figure 10.2 Clock-to-Q Delay 

Basic System Clocks 

The various clock signals used in the R4600/R4700 processor are 
described below, starting with MasterClock, upon which the processor 
bases all internal and external clocking. Note: All output clocks will have 
approximately a 50% duty cycle ± the jitter and any difference in rise and/ 
or fall times. 

MasterClock 

The processor bases all internal and external clocking on the single 
MasterClock input signal. The processor generates the clock output 
signal, MasterOut, at the same frequency as MasterClock and aligns 
MasterOut with MasterClock, if Syncln is properly connected to 
SyncOut. 
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MasterOut 

The processor generates the clock output signal, MasterOut, at the 
same frequency as MasterClock and aligns MasterOut with MasterClock, 
if Syncln is properly connected to SyncOut. MasterOut clocks certain 
external logic, such as the reset logic. 

SyncIn/SyncOut 

The processor generates SyncOut at the same frequency as 
MasterClock and aligns Syncln with MasterClock. 

SyncOut must be connected to Syncln either directly, or through an 
external buffer. The processor can compensate for both output driver and 
input buffer delays (and, when necessary, delay caused by an external 
buffer according to the connections of TClock and RClock to the rest of 
the system) when aligning Syncln with MasterClock. Figure 10.8 on 
page 10-9 gives an illustration of SyncOut connected to Syncln through 
an external buffer. 

PClock 

The processor generates an internal clock, PClock, at twice the 
frequency of MasterClock and precisely aligns every other rising edge of 
PClock with the rising edge of MasterClock. 

All internal registers and latches use PClock, which is the pipeline clock 
rate. 

SClock 

The R4600/R4700 processor divides PClock by 2, 3, 4, 5, 6, 7 or 8, 
programmed at boot-mode initialization to generate the internal clock 
signal, SClock. The processor uses SClock to sample data at the system 
interface and to clock data into the processor system interface output 
registers. 

The first rising edge of SClock, after ColdReset* is deasserted, is 
aligned with the first rising edge of MasterClock. 

TClock 

TClock (transmit clock) clocks the output registers of an external agent, 
and can be a global system clock for any other logic in the external agent. 

TClock is identical to SClock. The edges of TClock align precisely with 
the edges of SClock and TClock can also be aligned with MasterClock, 
when Syncln is properly connected to SyncOut. 

RClock 

The external agent uses RClock (receive clock) to clock its input 
registers. The processor generates RClock at the same frequency as 
SClock, although RClock leads TClock and SClock by 25 percent of 
SClock cycle time. 
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Figure 10.3 shows the clocks for a PClock-to-SClock division by 2. 
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Figure 10.3 Processor Clocks, PClock-to-SClock Division by 2 

System Timing Parameters 

As shown in Figure 10.3, data provided to the processor must be stable 
a minimum of t DS nanoseconds (ns) before the rising edge of SCIock and 
be held valid for a minimum of t DH ns after the rising edge of SCIock. 

Alignment to SCIock 

Processor data becomes stable a minimum of t DM ns and a maximum of 
t DO ns after the rising edge of SCIock. This drive-time is the sum of the 
maximum delay through the processor output drivers together with the 
maximum clock-to-Q delay of the processor output registers. 

Alignment to MasterClock 

Certain processor inputs (specifically VCCOk, ColdReset*, and Reset*) 
are sampled based on MasterClock, while others are output based on 
MasterClock. The same setup, hold, and drive-off parameters, t DS , t DH , 

t DM , and t DO > shown in Figure 10.3, apply to these inputs and outputs, but 
they are measured relative to MasterClock instead of SCIock. 

Phase-Locked Loop (PLL) 

The processor aligns SyncOut, PCIock, SCIock, TCIock, and RCIock 

with internal phase-locked loop (PLL) circuits that generate aligned clocks 
based on SyncOut/SyncIn. By their nature, PLL circuits are only capable 
of generating aligned clocks for MasterClock frequencies within a limited 
range. 
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Clocks generated using PLL circuits contain some inherent inaccuracy, 
or jitter, a clock aligned with MasterClock by the PLL can lead or trail 
MasterClock by as much as the related maximum jitter specified in the 
data sheet. 

PLL Components and Operation 

The passive components required for the Phase Locked Loop circuit are 
contained in the packages for the R4600 and R4700. There are no required 
external passive components. 

Passive Components 

The Phase Locked Loop circuit requires several passive components for 
proper operation, which are connected to PLLCapO, PLLCapl, VccP, and 
VssP, as illustrated in Figure 10.4. 
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Figure 10.4 PLL Passive Components 

It is essential to isolate the analog power and ground for the PLL circuit 
(VccP/VssP) from the regular power and ground (Vcc/Vss). Initial 
evaluations have yielded good results with the following values: 



R 


= 


5 ohms 


CI 


= 


1 nF 


C2 


= 


82 nF 


C3 


= 


10 |1F 


Cp 


= 


470 pF 



Since the optimum values for the filter components depend upon the 
application and the system noise environment, these values should be 
considered as starting points for further experimentation within your 
specific application. 
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Figure 10.5 shows the internal PLL and clock distribution network of the 
R4600/R4700. 
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Figure 10.5 R4600/R4700 PLL Network 

Connecting Clocks to a Phase-Locked System 

When the processor is used in a phase-locked system, the external agent 
must phase lock its operation to a common MasterClock. In such a 
system, the delivery of data and data sampling have common 
characteristics, even if the components have different delay values. For 
example, transmission time (the amount of time a signal takes to move from 
one component to another along a trace on the board) between any two 
components A and B of a phase-locked system can be calculated from the 
following equation: 

Transmission Time = (SClock period) - (t DO for A) - (t DS for B) - 
(Clock Jitter for A Max) - (Clock Jitter for B Max) 
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Figure 10,6 shows a block-level diagram of a phase-locked system using 
the R4600/R4700 processor. 
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Figure 10.6 R4600/R4700Processor Phase-Locked System 

Connecting Clocks to a System without Phase Locking 

When the R4600/R4700 processor is used in a system in which the 
external agent cannot lock its phase to a common MasterCiock, the 
output clocks RCIock and TCIock can clock the remainder of the system. 
Two clocking methodologies are described in this section: connecting to a 
gate-array device or connecting to discrete CMOS logic devices. 

Connecting to a Gate-Array Device 

When connecting to a gate-array device, both RCIock and TCIock are 
used within the gate-array. The gate array internally buffers RCIock and 
uses this buffered version to clock registers that sample processor 
outputs. 

These sampling registers should be immediately followed by staging 
registers clocked by an internally buffered version of TCIock. This buffered 
version of TCIock should be the global system clock for the logic inside the 
gate array and the clock for all registers that drive processor inputs. 
Figure 10.7 on page 7 is a block diagram of this circuit. 

Staging registers place a constraint on the sum of the clock-to-Q delay 
of the sample registers and the setup time of the synchronizing registers 
inside the gate arrays, as shown in the following equation: 

Clock-to-Q Delay + Setup of Synch Register < 0.25 (RCIock period) 

- (Max Clock Jitter for RCIock) 

- (Max Delay Mismatch for Clock Buffers on RCIock and TCIock) 
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Figure 10.7 is a block diagram of a system without phase lock, using the 
R4600/R4700 processor with an external agent implemented as a gate 
array. 
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Figure 10.7 Gate-Array System Without Phase Lock, Using the 
R4600/R4700 Processor 

In a system without phase lock, the transmission time for a signal from 
the processor to an external agent composed of gate arrays can be 
calculated from the following equation: 

Transmission Time = (75 percent of TCIock period) - (t DO for R4600/R4700) 
+ (Min External Clock Buffer Delay) 

- (External Sample Register Setup Time) 

- (Max Clock Jitter for R4600/R4700 Internal Clocks) 

- (Max Clock Jitter for RCIock) 
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The transmission time for a signal from an external agent composed of 
gate arrays to the processor in a system without phase lock can be 
calculated from the following equation: 

Transmission Time = (TClock period) - (t DS for R4600/R4700) 

- (Max External Clock Buffer Delay) 

- (Max External Output Register Clock-to-Q Delay) 

- (Max Clock Jitter for TClock) 

- (Max Clock Jitter for R4600/R4700 Internal Clocks) 

Connecting to a CMOS Logic System 

The processor uses matched delay clock buffers to generate aligned 
clocks to external CMOS logic. A matched delay clock buffer is inserted in 
the SyncOut/SyncIn alignment path of the processor, skewing SyncOut, 
MasterOut, RClock, and TClock to lead MasterClock by the buffer delay 
amount, while leaving PClock aligned with MasterClock. 

The remaining matched delay clock buffers are available to generate a 
buffered version of TClock aligned with MasterClock. Alignment error of 
this buffered TClock is the sum of the maximum delay mismatch of the 
matched delay clock buffers, and the maximum clock jitter of TClock* 

As the global system clock for the discrete logic that forms the external 
agent, the buffered version of TClock clocks registers that sample 
processor outputs, as well as clocking the registers that drive the processor 
inputs. 

The transmission time for a signal from the processor to an external 
agent composed of discrete CMOS logic devices can be calculated from the 
following equation: 

Transmission Time = (TClock period) - (t DO for R4600/R4700) 

- (External Sample Register Setup Time) 

- (Max External Clock Buffer Delay Mismatch) 

- (Max Clock Jitter for R4600/R4700 Internal Clocks) 

- (Max Clock Jitter for TClock) 
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Figure 10.8 is a block diagram of a system without phase lock, 
employing the R4600/R4700 processor and an external agent composed of 
both a gate array and discrete CMOS logic devices. 



MasterClock 



R4600/R4700 
MasterClock 

SysCmd 



SysAD 

SyncOut 
Syncln 



RCIock K 
TCIock 



>] 



Sample 
Registers 



r^ 




£0 Ifl , c Fl 



Memory 



Figure 10.8 Gate Array and CMOS System Without Phase Lock, Us- 
ing the R4600/R4700 Processor 

The transmission time for a signal from an external agent composed of 
discrete CMOS logic devices can be calculated from the following equation: 

Transmission Time = (TCIock period) - (t DS for R4600/R4700) 

- (Max External Output Register Clock-to-Q Delay) 

- (Max External Clock Buffer Delay Mismatch) 

- (Max Clock Jitter for R4600/R4700 Internal Clocks) 

- (Max Clock Jitter for TCIock) 

In this clocking methodology, the hold time of data driven from the 
processor to an external sampling register is a critical parameter. To 
guarantee hold time, the minimum output delay of the processor, t DM , 
must be greater than the sum of the following: 

Min hold time for the external sampling register 

+ max clock jitter for R4600/R4700 internal clocks 

+ max clock jitter for TCIock 

+ max delay mismatch of the external clock buffers 
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Introduction 

This chapter describes in detail the cache memory: its place in the 
R4600/R4700 memory organization and individual operations of the 
primary cache. 

This chapter uses the following terminology: 

• The primary cache may also be referred to as the P-cache. 

• The primary data cache may also be referred to as the D-cache. 

• The primary instruction cache may also be referred to as the I-cache. 
These terms are used interchangeably throughout this book. 

Memory Organization 

Figure 11.1 shows the R4600/R4700 system memory hierarchy. In the 
logical memory hierarchy, caches lie between the CPU and main memory. 
They are designed to make the speedup of memoiy accesses transparent 
to the user. Each functional block in Figure 11.1 has the capacity to hold 
more data than the block above it. For instance, physical main memory 
has a larger capacity than the primary cache. At the same time, each 
functional block takes longer to access than any block above it. For 
instance, it takes longer to access data in main memory than in the CPU 
on-chip registers. 
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Figure 11.1 Logical Hierarchy of Memory 

The R4600/R4700 processor has two on-chip primary caches: one holds 
instructions (the instruction cache), the other holds data (the data cache). 
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Overview of Cache Operations 

As described earlier, caches provide fast temporary data storage, and 
they make the speedup of memory accesses transparent to the user. In 
general, the processor accesses cache-resident instructions or data 
through the following procedure: 

1. The processor, through the on-chip cache controller, attempts to 
access the next instruction or data in the primary cache. 

2. The cache controller checks to see if this instruction or data is present 
in the primaiy cache. 

• If the instruction/ data is present, the processor retrieves it. This is 
called a primary-cache hit 

• If the instruction/data is not present in the primaiy cache, it is re- 
trieved as a cache line from memory and is written into the primary 
cache. 

3. The processor retrieves the instruction/data from the primaiy cache 
and operation continues. For a data cache miss, the processor can restart 
the pipeline after the first doubleword (the one at the miss address) is 
retrieved and continues the cache line refill in parallel. 

It is possible for the same data to be in two places simultaneously: main 
memory and the primaiy cache. This data is kept consistent through the 
use of either a write-back or a write-through methodology. For a write-back 
cache, the modified data is not written back to memoiy until the cache line 
is replaced. In a write-through cache, the data is written to memory as the 
cached data is modified (with a possible delay due to the write buffer). 

R4600/R4700 Cache Description 

This section describes the organization of on-chip primaiy caches. As 
Figure 1 1 . 1 on page 1 shows, the R4600/R4700 contains separate primaiy 
instruction and data caches. 

Figure 11.2 provides block diagrams of the R4600/R4700 memory 
model. 
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Figure 11.2 Cache Support in the R4600/R4700 

Cache Line Size 

A cache line is the smallest unit of information that can be fetched from 
memory to be filled into the cache. A primaiy cache line is 8 words in 
length, and is represented by a single tag. 

Upon a cache miss in the primaiy cache, the missing cache line is 
loaded from memoiy into the primaiy cache. 

Cache Organization and Accessibility 

This section describes the organization of the primaiy cache, including 
the manner in which it is mapped, the addressing used to index the cache, 
and composition of the cache lines. The primaiy instruction and data 
caches are indexed with a virtual address (VA). 
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Organization of the Primary Instruction Cache (I-Cache) 

Each line of primary I-cache data (although it is actually an instruction, 
it is referred to as data to distinguish it from its tag) has an associated 28- 
bit tag that contains a 24-bit physical address, a single valid bit, a reserved 
bit, a single parity bit and the FIFO replacement bit. Word parity is used 
on I-cache data. 

The R4600/R4700 processor primary I-cache has the following 
characteristics: 

• two-way set associative 

• indexed with a virtual address 

• checked with a physical tag 

• organized with 8-word (32-byte) cache line. 

Figure 1 1.3 shows the format of a primary I-cache line. 
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Figure 11.3 R4600/R4700 Primary I-Cache Line Format 

Organization of the Primary Data Cache (D-Cache) 

Each line of primary D-cache data has an associated 30-bit tag that 
contains a 24-bit physical address, 2-bit cache line state, a write-back bit, 
a parity bit for the physical address and cache state fields, a parity bit for 
the write-back bit and the FIFO replacement bit. 

The R4600/R4700 processor primary D-cache has the following 
characteristics: 

• write-back or write-through on a per-page basis 

• two-way set associative 

• indexed with a virtual address 

• checked with a physical tag 

• organized with 8-word (32-byte) cache line. 
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Figure 1 1.4 shows the format of a primary D-cache line. 
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Figure 11.4 R4600/R4700 8-Word Primary Data Cache Line Format 

In the R4600/R4700, the W (write-back) bit, not the cache state, 
indicates whether or not the primary cache contains modified data that 
must be written back to memory. 

Note: There is no hardware support for cache coherency. Thus the only 
cache states used are Dirty Exclusive and Invalid. 
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Accessing the Primary Caches 

Figure 11.5 shows the virtual address (VA) index into the primary 
caches. Each instruction and data cache size is 16 Kbytes. 
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Figure 11.5 Primary Cache Data and Tag Organization 



Cache States 

The terms below are used to describe the state of a cache line: 

• Exclusive: a cache line that is present in exactly one cache in the sys- 
tem is exclusive. This is always the case for the R4600/R4700. All 
cache lines are in an exclusive state. 

• Dirty: a cache line that contains data that has changed since it was 
loaded from memory is dirty. 

• Clean: a cache line that contains data that has not changed since it 
was loaded from memory is clean. 

• Shared: a cache line that is present in more than one cache in the 
system. The R4600/R4700 does not provide for hardware cache co- 
herency. This state should never happen in normal operations. 

The R4600/R4700 only supports the four cache states as shown in 
Table 1 1. 1 on page 6. The only states that will occur in the R4600/R4700, 
under normal operations are the Dirty Exclusive and Invalid states. 

Note: Even though valid data is in the Dirty Exclusive state, it may still 
be consistent with memory. One must look at the dirty bit, W, to determine 
if the cache line is to be written back to memory when it is replaced. 
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Each primary cache line in the R4600/R4700 system is in one of the 
states described in Table 11,1. 



Cache Line 
State 


Description 


Invalid 


A cache line that does not contain valid information must be marked invalid, and cannot 
be used. A cache line in any other state than invalid is assumed to contain valid informa- 
tion. 


Shared 


A cache line that is present in more than one cache in the system is shared. This state will 
not occur for normal operations. 


Clean Exclusive 


A clean exclusive cache line contains valid information and this cache line is not present 
in any other cache. The cache line is consistent with memory and is not owned by the pro- 
cessor (see "Cache Line Ownership" on page 6 in this chapter). This state will not occur 
for normal operations. 


Dirty Exclusive 


A dirty exclusive cache line contains valid information and is not present in any other 
cache. The cache line may or may not be consistent with memory and is owned by the 
processor (see "Cache Line Ownership" on page 6 in this chapter). Use the W bit to deter- 
mine if the line must be written back on replacement. 



Table 11.1 Cache States 

Primary Cache States 

Each primary data cache line is normally in one of the following states: 

• invalid 

• dirty exclusive 

Each primary instruction cache line is in one of the following states: 

• invalid 

• valid 

Cache Line Ownership 

The processor is the owner of a cache line when it is in the dirty 
exclusive state and is responsible for the contents of that line. There can 
only be one owner for each cache line. 

The ownership of a cache line is set and maintained through the rules 
described below. 

• A processor assumes ownership of the cache line if the state of the 
primary cache line is dirty exclusive. 

• A processor that owns a cache line is responsible for writing the cache 
line back to memory if the line is replaced during the execution of a 
Write-back or Write-back Invalidate cache instruction if the line is in 
a write-back page. The Cache instruction is explained in Appendix A. 

• Memory always owns clean cache lines 

• The processor gives up ownership of a cache line when the state of the 
cache line changes to invalid. 

Therefore, based on these rules and that any valid data cache line is in 
the Dirty Exclusive state (under normal operating conditions), the 
processor is considered to be the owner of the cache line. 

Cache Write Policy 

The R4600/R4700 processor manages its primary data cache by using 
either a write-back or a write-through policy on a per-page basis. In a 
write-back cache, the data is not written back to memory until the cache 
line is replaced. A write-through policy means the store data is written to 
the cache and to memory. The write of the data to memory may not occur 
at the same time as the write to cache due to the write buffer. 

For a write-back entry, if the cache line is valid and has been modified 
(the Whit is set), the processor writes this cache line back to memory when 
the line is replaced, either in the course of satisfying a cache miss or during 
the execution of a Write-back or Write-back Invalidate CACHE instruction. 
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For a write-through entry, whenever a store hits in the cache line, the 
data is also written to memory via the write buffer. The store will not set or 
clear the Wbit for a write-through cache line. This is to allow a different 
virtual address that maps to the same physical address and with a write- 
back policy to still set the W bit. For a miss to a write-through line, the 
action taken will be determined by the write-allocation policy. For a write- 
allocate entry, the cache line is first retrieved from memory and the store 
will then continue. A no write-allocate entry will just post the write to the 
system interface, via the write buffer, in the same manner as an uncached 
write. 

When the processor writes a cache line back to memory, it does not 
ordinarily retain a copy of the cache line, and the state of the cache line is 
changed to invalid. However, there are exceptions. For example, the 
processor retains a copy of the cache line if a cache line is written back by 
the Hit Write-back cache instruction. If the W bit is set, the cache line is 
written back and the W bit is cleared. The processor signals this line 
retention during a write by setting SysCmd(2) to a 1, as described in 
Chapter 12. 

Cache State Transition Diagrams 

The following sections describe the cache state diagrams that illustrate 
the cache state transitions for the primary cache. Figure 11.6 shows the 
state diagram of the primary cache. 

When an external agent supplies a cache line, it need not return the 
initial state of the cache line, for normal operations (see Chapter 12 for a 
definition of an external agent). This is because the only read request the 
R4600/R4700 should issue are for non-coherent data and the lower three 
bits for the data identifier are reserved. The initial state will automatically 
be set to DE by the R4600/R4700. Otherwise, the processor changes the 
state of the cache line during one of the following events: 

• A store to a dirty exclusive line remains in a dirty exclusive state. 

• The state is changed to invalid for: 

- A Cache invalidate operation. 

- If the line is replaced 
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Figure 11.6 Primary Data Cache State Diagram 

Cache Coherency Overview 

Systems using more than one master must have a mechanism to 
maintain data consistency throughout the system. This mechanism is 
called a cache coherency protocol. The R4600/R4700 does not provide 
any hardware cache coherency. Cache coherency must be handled with 
software. 

Cache Coherency Attributes 

Cache coherency attributes are necessary to ensure the consistency of 
data throughout the system. 
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Bits in the translation look-aside buffer (TLB) control coherency on a 
per-page basis. Specifically, the TLB contains 3 bits per entry that provide 
two possible coherency attribute types; they are listed below and described 
more fully in the following sections. 

• uncached 

• noncoherent (includes 3 attribute values) 

Table 1 1.2 summarizes the behavior of the processor on load misses and 
store misses for each of the coherency attribute types listed above. The 
following sections describe in detail these coherency attribute types 



Attribute Type 


Load Miss 


Store Miss 


Uncached 


Main memory read 


Main memory write 


Noncoherent 


Noncoherent read 


Noncoherent read (write-allocate page) 
Main memory write (no write-allocate page) 



Table 11.2 Coherency Attributes and Processor Behavior 

Uncached 

Lines within an uncached page are never in a cache. When a page has 
the uncached coherency attribute, the processor issues a doubleword, 
partial-doubleword, word, or partial-word read or write request directly to 
main memory (bypassing the cache) for any load or store to a location 
within that page. 

Noncoherent 

Lines with a noncoherent attribute type can reside in a cache; a load 
miss causes the processor to issue a noncoherent block read request to a 
location within the cached page. For a store miss to a write-allocate page, 
the processor issues a noncoherent block read request to a location within 
the cached page and then does the write-through. If the page has the no 
write-allocate attribute, a store miss will generate a write to the memory as 
in the uncached case. 

Cache Operation Modes 

The R4600/R4700 processor only supports the no-secondary-cache 
mode (only uncached and noncoherent coherency attributes are 
applicable) of R4x00 operation. 

R4600/R4700 Processor Synchronization Support 

In a multiprocessor system, it is essential that two or more processors 
working on a common task execute without corrupting each other's 
subtasks. Synchronization, an operation that guarantees an orderly 
access to shared memory, must be implemented for a properly functioning 
multiprocessor system. Two of the more widely used methods are 
discussed in this section: test-and-set, and counter. Even though the 
R4600/R4700 does not support symmetric multi-processing (SMP), these 
are useful for multi-master and heterogenous multi-processing. 

Test-and-Set 

Test-and-set uses a variable called the semaphore, which protects data 
from being simultaneously modified by more than one processor. In other 
words, a processor can lock out other processors from accessing shared 
data when the processor is in a critical section, a part of program in which 
no more than a fixed number of processors is allowed to execute. In the 
case of test-and-set, only one processor can enter the critical section. 
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Figure 11.7 illustrates a test-and-set synchronization procedure that 
uses a semaphore; when the semaphore is set to 0, the shared data is 
unlocked, and when the semaphore is set to 1, the shared data is locked. 
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Figure 11.7 Synchronization with Test-and-Set 

The processor begins by loading the semaphore and checking to see if it 
is unlocked (set to 0) in steps 1 and 2. If the semaphore is not 0, the 
processor loops back to step 1 . If the semaphore is 0, indicating the shared 
data is not locked, the processor next tries to lock out any other access to 
the shared data (step 3). If not successful, the processor loops back to step 
1, and reloads the semaphore. 

If the processor is successful at setting the semaphore (step 4), it 
executes the critical section of code (step 5) and gains access to the shared 
data, completes its task, unlocks the semaphore (step 6), and continues 
processing. 

Counter 

Another common synchronization technique uses a counter. A counter 
is a designated memory location that can be incremented or decremented. 

In the test-and-set method, only one processor at a time is permitted to 
enter the critical section. Using a counter, up to N processors are allowed 
to concurrently execute the critical section. All processors after the Mh 
processor must wait until one of the JV processors exits the critical section 
and a space becomes available. 

The counter works by not allowing more than one processor to modify it 
at any given time. Conceptually, the counter can be viewed as a variable 
that counts the number of limited resources (for example, the number of 
processes, or software licenses, etc.). 
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Figure 11.8 shows this process. 
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Figure 11.8 Synchronization Using a Counter 



Load Linked and Store Conditional 

The R4600/R4700 instructions Load Linked (LL) and Store Conditional 
(SC) provide support for processor synchronization. These two 
instructions work very much like their simpler counterparts, load and 
store. The LL instruction, in addition to doing a simple load, has the side 
effect of setting a bit called the link bit This link bit forms a breakable link 
between the LL instruction and the subsequent SC instruction. The SC 
performs a simple store if the link bit is set when the store executes. If the 
link bit is not set, then the store fails to execute. The success or failure of 
the SC is indicated in the target register of the store. 

The link is broken upon completion of an ERET (return from exception) 
instruction. 

The most important features of LL and SC are: 

• They provide a mechanism for generating all of the common synchro- 
nization primitives including test-and-set, counters, sequencers, etc., 
with no additional overhead. 

• When they operate, bus traffic is generated only if the state of the 
cache line changes; lock words stay in the cache until some other pro- 
cessor takes ownership of that cache line. 
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Examples Using LL and SC 

Figure 11.9 shows how to implement test-and-set using LL and SC 
instructions. 
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Figurc 11.9 Test-and-Set using LL and SC 
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Figure 1 1. 10 shows synchronization using a counter. 
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Figure 11.10 Counter Using LL and SC 
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Introduction 

The System interface allows the processor to access external resources 
needed to satisfy cache misses and uncached operations, while permit- 
ting an external agent access to some of the processor internal resources. 

This chapter describes the system interface from the point of view of 
both the processor and the external agent. 

Terminology 

The following terms are used in this chapter: 

An external agent is any logic device connected to the processor, over 
the system interface, that allows the processor to issue requests. 

A system event is an event that occurs within the processor and 
requires access to external system resources. 

Sequence refers to the precise series of requests that a processor gener- 
ates to service a system event. 

Protocol refers to the cycle-by-cycle signal transitions that occur on the 
system interface pins to assert a processor or external request. 

Syntax refers to the precise definition of bit patterns on encoded buses, 
such as the command bus. 

System Interface Description 

The R4600/R4700 processor supports a 64-bit address/ data interface 
that can construct a simple uniprocessor with main memory. The System 
interface consists of: 

• 64-bit address and data bus, SysAD 

• 8-bit SysAD check bus, SysADC (even parity only) 

• 9-bit command bus, SysCmd 

• six handshake signals: 

- RdRdy*, WrRdy* 

- ExtRqst*, Release* 

- Validln*, ValidOut* 

The processor uses the system interface to access external resources in 
order to service processor requests such as cache misses, cache line 
write-backs, write-through stores and uncached operations. 
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Interface Buses 

Figure 12.1 shows the primary communication paths for the system 
interface: a 64-bit address and data bus, SysAD(63:0), and a 9-bit 
command bus, SysCmd(8:0). These SysAD and the SysCmd buses are 
bidirectional; that is, they are driven by the processor to issue a processor 
request, and by the external agent to issue an external request (see 
"Processor and External Request Protocols" on page 12-14 for more infor- 
mation). 

A request through the system interface consists of: 

• an address 

• a System interface command that specifies the precise nature of the 
request 

• a series of data elements if the request is for a write or read response. 
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Figure 12. 1 System Interface Buses 

Address and Data Cycles 

Cycles in which the SysAD bus contains a valid address are called 
address cycles. Cycles in which the SysAD bus contains valid data are 
called data cycles. Validity is determined by the state of the Validln* and 
ValidOut* signals (described in "Interface Buses" on page 12-2). 

The SysCmd bus identifies the contents of the SysAD bus during any 
cycle in which it is valid. The most significant bit of the SysCmd bus is 
always used to indicate whether the current cycle is an address cycle or a 
data cycle. 

• During address cycles [SysCmd(8) = 0] , the remainder of the SysCmd 
bus, SysCmd(7:0), contains a System interface command (the encod- 
ing of system interface commands is detailed in "System Interface 
Commands and Data Identifiers" on page 12-32). 

• During data cycles [SysCmd(8) =1], the remainder of the SysCmd 
bus, SysCmd(7:0), contains a data identifier (the encoding of data 
identifiers is detailed later in this chapter). 
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Issue Cycles 

There are two types of processor issue cycles: 

• processor read request issue cycles 

• processor write request issue cycles. 

The processor samples the signal RdRdy* to determine the issue cycle 
for a processor read request; the processor samples the signal WrRdy* to 
determine the issue cycle of a processor write request. 

As shown in Figure 12.2, RdRdy* must be asserted for one clock cycle, 
two cycles prior to the address cycle of the processor read request to 
define the address cycle as the issue cycle (cycle 5 in Figure 12.2). 
RdRdy* does not need to be asserted during the issue cycle. 
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Figure 12.2 State of RdRdy* Signal for Read Requests 

As shown in Figure 12.3, WrRdy* must be asserted for one clock cycle, 
two cycles prior to the first address cycle of the processor write request to 
define the address cycle as the issue cycle (cycle 5 in Figure 12.3). 
WrRdy* does not need to be asserted during the issue cycle. 
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Note: WrRdy* must be sampled LOW at the end of cycle 3, 
which is marked with an asterisk. 



Figure 12.3 State of WrRdy* Signal for Write Requests 

The processor repeats the address cycle for the request until the condi- 
tions for a valid issue cycle are met. After the issue cycle, if the processor 
request requires data to be sent, the data transmission begins. There is 
only one issue cycle for any processor request. 

The processor accepts external requests, even while attempting to issue 
a processor request, by releasing the system interface to slave state in 
response to an assertion of ExtRqst* by the external agent. 
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Note that the rules governing the issue cycle of a processor request are 
strictly applied to determine the action the processor takes. The 
processor either: 

• completes the issuance of the processor request in its entirety before 
the external request is accepted, or 

• releases the system interface to slave state without completing the is- 
suance of the processor request. 

In the latter case, the processor issues the processor request (provided 
the processor request is still necessary) after the external request is 
complete. The rules governing an issue cycle again apply to the processor 
request. 

Handshake Signals 

The processor manages the flow of requests through the following six 
control signals: 

• RdRdy*, WrRdy* are used by the external agent to indicate when it 
can accept a new read (RdRdy*) or write (WrRdy*) transaction. 

• ExtRqst*, Release* are used to transfer control of the SysAD and 
SysCmd buses. ExtRqst* is used by an external agent to indicate a 
need to control the interface. Release* is asserted by the processor 
when it transfers the mastership of the system interface to the exter- 
nal agent. 

• The R4600/R4700 processor uses ValidOut* and the external agent 
uses Validln* to indicate valid command/data on the SysCmd/ 
SysAD buses. 

System Interface Protocols 

Figure 12.4 shows the system interface operates from register to 
register. That is, processor outputs come directly from output registers 
and begin to change with the rising edge of SClock. 1 

Processor inputs are fed directly to input registers that latch these 
input signals with the rising edge of SClock. This allows the system 
interface to run at the highest possible clock frequency. 
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Figure 12.4 System Interface Register-to-Register Operation 



1# SClock is an internal clock used by the processor to sample data at the system 
interface and to clock data into the processor system interface output registers; 
see Chapter 10 for more details. 
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Master and Slave States 

When the R4600/R4700 processor is driving the SysAD and SysCmd 
buses, the system interface is in master state. When the external agent is 
driving the SysAD and SysCmd buses, the system interface is in slave 
state. 

In master state, the processor drives the SysAD and SysCmd buses and 
will assert the signal ValidOut* whenever these buses are valid. 

In slave state, the external agent drives the SysAD and SysCmd buses 
and asserts the signal Validln* whenever these buses are valid. 

Moving from Master to Slave State 

The system interface remains in master state unless one of the following 
occurs: 

• The external agent requests and is granted the system interface (ex- 
ternal arbitration). 

• The processor issues a read request and performs an uncompelled 
change to slave state. 

External Arbitration 

The system interface must be in slave state for the external agent to 
issue an external request through the system interface. The transition 
from master state to slave state is arbitrated by the processor using the 
system interface handshake signals ExtRqst* and Release*. This transi- 
tion is described by the following procedure: 

1. An external agent signals that it wishes to issue an external request 
by asserting ExtRqst*. 

2. When the processor is ready to accept an external request, it releases 
the system interface from master to slave state by asserting Release* for 
one cycle. 

3. The system interface returns to master state as soon as the issue of 
the external request is complete. 

This process is described in "External Arbitration Protocol*' on page 12- 
24. 

Uncompelled Change to Slave State 

An uncompelled change to slave state is the transition of the system 
interface from master state to slave state, initiated by the processor when 
a processor read request is pending. Release* is asserted automatically 
after a read request. An uncompelled change to slave state occurs during 
the issue cycle of a read request. 

After an uncompelled change to slave state, the processor returns to 
master state at the end of the next external request. This can be a read 
response, or some other type of external request. 

An external agent must note that the processor has performed an 
uncompelled change to slave state and begin driving the SysAD bus along 
with the SysCmd bus. As long as the system interface is in slave state, 
the external agent can begin a single external request without arbitrating 
for the system interface; that is, without asserting ExtRqst*. 

After the external request, the system interface returns to master state. 

Whenever a processor read request is pending, after the issue of a read 
request, the processor automatically switches the system interface to 
slave state, even though the external agent is not arbitrating to issue an 
external request. This transition to slave state allows the external agent 
to quickly return read response data. 
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Processor and External Requests 

There are two broad categories of requests: processor requests and 
external requests. These two categories are described in this section. 

When a system event occurs, the processor issues either a single 
request or a series of requests — called processor requests — through the 
system interface, to access an external resource and service the event. 
For this to work, the processor system interface must be connected to an 
external agent that is compatible with the system interface protocol, and 
can coordinate access to system resources. 

An external agent requesting access to a processor status register 
generates an external request This access request passes through the 
system interface. System events and request cycles are shown in 
Figure 12.5. 
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Figure 12.5 Requests and System Events 

Rules for Processor Requests 

The following rules apply to processor requests. 

• After issuing a processor read request, the processor cannot issue a 
subsequent read request until it has received a read response. 

• After the processor has issued a write request in R4x00 compatible 
write mode (set at boot time), the processor cannot issue a subsequent re- 
quest until at least four cycles after the issue cycle of the write request. 
This means back-to-back write requests with a single data cycle are sepa- 
rated by two unused system cycles, as shown in Figure 12.6. 

• After the processor has issued a write request in either of the two new 
write modes, write reissue and pipelined writes, the processor can issue a 
subsequent write immediately provided the WrRdy* requirement is meet. 
This is discussed in more detail later in this chapter. 
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Figure 12.6 Back-to-Back Write Cycle Timing 
(R4000 compatible mode) 

Processor Requests 

A processor request is a request or a series of requests, through the 
system interface, to access some external resource. As shown in 
Figure 12.7, processor requests include only reads and writes. 
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Figure 12.7 Processor Requests 

Read request asks for a block, doubleword, partial doubleword, word, or 
partial word of data either from main memory or from another system 
resource. 

Write request provides a block, doubleword, partial doubleword, word, 
or partial word of data to be written either to main memory or to another 
system resource. 

Processor requests are managed by the processor in the equivalent of 
the R4000/R4400 no-secondary-cache mode. 

In no-secondary-cache mode, the processor issues requests in a strict 
sequential fashion; that is, the processor is only allowed to have one 
request pending at any time. For example, the processor issues a read 
request and waits for a read response before issuing any subsequent 
requests. The processor submits a write request only if there are no read 
requests pending. 

The processor has the input signals RdRdy* and WrRdy* to allow an 
external agent to manage the flow of processor requests. RdRdy* 
controls the flow of processor read requests, while WrRdy* controls the 
flow of processor write requests. 

The processor request cycle sequence is shown in Figure 12.8. 
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Figure 12.8 Processor Request 

Processor Read Request 

When a processor issues a read request, the external agent must access 
the specified resource and return the requested data. (Processor read 
requests are described in this section; external read requests are 
described in "External Requests" on page 12-9.) 

A processor read request can be split from the external agent's return of 
the requested data; in other words, the external agent can initiate an 
unrelated external request before it returns the response data for a 
processor read. A processor read request is completed after the last word 
of response data has been received from the external agent. 

Note that the data identifier (see "System Interface Commands and Data 
Identifiers" on page 12-32) associated with the response data can signal 
that the returned data is erroneous, causing the processor to take a bus 
error. 

Processor read requests that have been issued, but for which data has 
not yet been returned, are said to be pending. A read remains pending 
until the requested read data is returned. 

In no-secondaiy-cache mode, the external agent must be capable of 
accepting a processor read request any time the following two conditions 
are met: 

• There is no processor read request pending. 

• The signal RdRdy* has been asserted for one clock cycle, two cycles 
before the issue cycle. 

Processor Write Request 

When a processor issues a write request, the specified resource is 
accessed and the data is written to it. (Processor write requests are 
described in this section; external write requests are described in 
"External Requests" on page 12-9.) 

A processor write request is complete after the last word of data has 
been transmitted to the external agent. 

In no-secondaiy-cache mode, the external agent must be capable of 
accepting a processor write request any time the following two conditions 
are met: 

• No processor read request is pending. 

• The signal WrRdy* has been asserted for one clock cycle, two cycles 
before the issue cycle. 
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The R4600/R4700 has added two new modes to enhance the 
throughput of non-block writes. These modes allow for 2 cycle throughput 
on back-to-back non-block writes. The actual protocol is discussed in the 
write protocol section of this chapter. The external agent must be capable 
of accepting a processor write request in these modes under the same 
conditions as for the R4x00 compatibility mode (except as explained in 
the protocol section. 

External Requests 

External requests include read, write and null requests, as shown in 
Figure 12.9. This section also includes a description of read response, a 
special case of an external request. 
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Figure 12.9 External Requests 

Read request asks for a word of data from the processor's internal 
resource. 

Write request provides a word of data to be written to the processor's 
internal resource. 

Null request requires no action by the processor; it provides a mecha- 
nism for the external agent to return control of the system interface to the 
master state without affecting the processor. 

The processor controls the flow of external requests through the arbi- 
tration signals ExtRqst* and Release*, as shown in Figure 12.10. The 
external agent must acquire mastership of the system interface before it 
is allowed to issue an external request; the external agent arbitrates for 
mastership of the system interface by asserting ExtRqst* and then 
waiting for the processor to assert Release* for one cycle. 
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Figure 12.10 External Request 
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Mastership of the system interface always returns to the processor after 
an external request is issued. The processor does not accept a subse- 
quent external request until it has completed the current request. 

If there are no processor requests pending, the processor decides, based 
on its internal state, whether to accept the external request, or to issue a 
new processor request. The processor can issue a new processor request 
even if the external agent is requesting access to the system interface. 

The external agent asserts ExtRqst* indicating that it wishes to begin 
an external request. The external agent then waits for the processor to 
signal that it is ready to accept this request by asserting Release*. The 
processor signals that it is ready to accept an external request based on 
the criteria listed below. 

• The processor completes any processor request that is in progress. 

• While waiting for the assertion of RdRdy* to issue a processor read 
request, the processor can accept an external request if the request is 
delivered to the processor one or more cycles before RdRdy* is assert- 
ed. 

• While waiting for the assertion of WrRdy* to issue a processor write 
request, the processor can accept an external request provided the re- 
quest is delivered to the processor one or more cycles before WrRdy* 
is asserted. 

• If waiting for the response to a read request after the processor has 
made an uncompelled change to a slave state, the external agent can 
issue an external request before providing the read response data. 

External Read Request 

In contrast to a processor read request, data is returned directly in 
response to an external read request; no other requests can be issued 
until the processor returns the requested data. An external read request 
is complete after the processor returns the requested word of data. 

The data identifier (see "System Interface Commands and Data Identi- 
fiers" on page 12-32) associated with the response data can signal that 
the returned data is erroneous, causing the processor to take a bus error. 

Note: The R4600/R4700 does not contain any resources that are 
readable by an external read request; in response to an external read 
request the processor returns undefined data and a data identifier with 
its Erroneous Data bit, SysCmd(5), set. 

External Write Request 

When an external agent issues a write request, the specified resource is 
accessed and the data is written to it. An external write request is 
complete after the word of data has been transmitted to the processor. 

The only processor resource available to an external write request is the 
IP field of the Cause register. 

Read Response 

A read response returns data in response to a processor read request, 
as shown in Figure 12.11. While a read response is technically an 
external request, it has one characteristic that differentiates it from all 
other external requests — it does not perform system interface arbitration. 
For this reason, read responses are handled separately from all other 
external requests, and are simply called read responses. When a read 
response comes back with bad parity for the first datum, a cache error 
exception results. 
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Figure 12. 1 1 Read Response 

Handling Requests 

This section details the sequence, protocol and syntax (see 'Termi- 
nology" on page 12-1 for definitions of these terms) of both processor and 
external requests. The following system events are discussed: 

• load miss (no-secondaiy-cache mode) 

• store miss (no-secondaiy-cache mode) 

• store hit 

• uncached loads/stores 

• CACHE operations 

• load linked store conditional. 

Load Miss 

When a processor load misses in the primaiy cache, before the 
processor can proceed it must obtain the cache line that contains the 
data element to be loaded from the external agent. 

If the new cache line replaces a current cache line with a W bit set, the 
current cache line must be written back. 

The processor examines the coherency attribute (cache coherency 
attributes are described in Chapter 1 1) in the TLB entry for the page that 
contains the requested cache line, and executes the following request: 

• The coherency attribute is noncoherent the processor issues a non- 
coherent read request. 

Table 12.1 shows the actions taken on a load miss to primaiy cache. 



Page Attribute 


State of Data Cache Line Being Replaced 


Clean/Invalid 


Dirty (W=l) 


Noncoherent 


NCR 


NCR/W 


NCR Processor noncoherent block read request 
NCR/W Processor noncoherent block read request followed by processor 
block write request 



Table 12.1 Load Miss to Primary Cache 
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No-Secondary-Cache Mode — Load Miss 

In no-secondary-cache mode, if the cache line must be written back on 
a load miss, the read request is issued and completed before the write 
request is handled. The processor takes the following steps: 

1 . The processor issues a noncoherent read request for the cache line 
that contains the data element to be loaded. 

2. The processor then waits for an external agent to provide the read 
response. 

3. The processor will restart the pipeline after the first doubleword (the 
data that missed is fetched first). The rest of the data cache line will be 
placed into the cache in parallel. 

If the current cache line must be written back, the processor issues a 
write request to save the dirty cache line in memory. 

Store Miss 

When a processor store misses in the primary cache, the processor may 
request, from the external agent, the cache line that contains the target 
location of the store for pages that are either write-back or write-through 
with write-allocate only. The processor examines the coherency attribute 
in the TLB entry for the page (TLB page coherency attributes are listed in 
Chapter 4) that contains the requested cache line to see if the line is 
write-allocate or no-write-allocate. 

The processor then executes one of the following requests: 

• If the coherency attribute is noncoherent, write-back or noncoherent, 
write-through with write-allocate, a noncoherent block read request 
is issued. 

• If the coherency attribute is noncoherent, write-through with no 
write-allocate, the processor issues a non-block write request. 

Table 12.1 shows the actions taken on a store miss to the primary 
cache. 



Page Attribute 


State of Data Cache Line Being Replaced 


Clean/Invalid 


Dirty (W=l) 


Noncoherent, write-back or 
Noncoherent, write-through with 
write-allocate 


NCR 


NCR/W 


Noncoherent, write-through with 
no write-allocate 


NCW 


NA 


NCR Processor noncoherent block read request 

NCR/W Processor noncoherent block read request followed by processor 

block write request 
NCW Processor noncoherent write request 



Table 12.2 Store Miss to Primary Cacbe 

No-Secondary-Cache Mode — Store Miss 

If the coherency attribute is write-back or write-through with write-allo- 
cate, the processor issues a read request for the cache line that contains 
the data element to be loaded, then awaits the external agent to provide 
read data in response to the read request. Then, if the current cache line 
must be written back, the processor issues a write request for the current 
cache line. For a write-through, no write-allocate store miss, the 
processor issues a write request only. 
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In no-secondaiy-cache mode, if the new cache line replaces a current 
cache line whose Write back (W) bit is set, the current cache line moves to 
an internal write buffer before the new cache line is loaded in the primary 
cache. 

Store Hit 

This section describes store hits in no-secondary-cache mode for both 
write-back and write-through lines. 

No-Secondary-Cache Mode — Store Hit 

In no-secondaiy-cache mode, the action on the system interface will be 
determined by whether the line is write-back or write-through. All lines 
that use a write-back policy are set to the dirty exclusive cache state and 
there is no bus transactions generated. For lines with a write-through 
policy, the store will also generate a processor write request for the store 
data. 

Uncached Loads or Stores 

When the processor performs an uncached load, it issues a nonco- 
herent word read request (the actual access can be for a doubleword, 
word, partial word or byte, but the request is called a word read request 
to differentiate it from the block read request). When the processor 
performs an uncached store, it issues a doubleword, partial doubleword, 
word, or partial word write request. 

The CPU expects valid parity and data in the full SysAD bus (all 64 
bits), even if it is looking for less than a double word. Even if you do not 
want to return the full double word, you still must tell it not to check the 
parity if you are not using all 64 bits. In other words, either return 64 
bits with parity, or tell it not to check parity. 

All writes by the processor will be buffered from the system interface by 
the 4-deep write buffer. The write requests are sent to the system inter- 
face when there are no other requests in progress. If the write buffer 
contains any entries when a block request is needed, the write buffer is 
first flushed before any read request will occur (cache miss or uncached 
load). 

Both a data cache miss and an uncached data load will flush the write 
buffer. 

CACHE Operations 

The processor provides a variety of CACHE operations to maintain the 
state and contents of the primary cache. During the execution of the 
CACHE operation instructions, the processor can issue write requests. 
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Load Linked/Store Conditional Operation 

Generally, the execution of a Load Linked/ Store Conditional instruction 
sequence is not visible at the system interface; that is, no special requests 
are generated due to the execution of this instruction sequence. 

There is, however, one situation in which the execution of a Load 
Linked/Store Conditional instruction sequence is visible, as indicated by 
the link address retained bit during a processor read request, as 
programmed by the SysCmd(2) bit. This situation occurs when the data 
location targeted by a Load-Linked-Store-Conditional instruction 
sequence maps to the same cache line to which the instruction area 
containing the Load Linked/Store Conditional code sequence is mapped. 
In this case, immediately after executing the Load Linked instruction, the 
cache line that contains the link location is replaced by the instruction 
line containing the code. The link address is kept in a register separate 
from the cache, and remains active as long as the link bit, set by the Load 
Linked instruction, is set. 

The link bit, which is set by the load linked instruction, is cleared by a 
change of cache state for the line containing the link address, or by a 
Return From Exception. 

For more information, refer to Chapter 11, or see the specific Load 
Linked and Store Conditional instructions described in Appendix A. 

Processor and External Request Protocols 

The following sections contain a cycle-by-cycle description of the bus 
arbitration protocols for each type of processor and external request. 
Table 12.3 lists the abbreviations and definitions for each of the buses 
that are used in the timing diagrams that follow. 



Scope 


Abbreviation 


Meaning 


Global 


Unsd 


Unused 


SysAD bus 


Addr 


Physical address 


Data<n> 


Data element number n of a block of data 


SysCmd bus 


Cmd 


An unspecified system interface command 


Read 


A processor or external read request command 


Write 


A processor or external write request command 


SINull 


A system interface release external null request 
command 


NData 


A noncoherent data identifier for a data element 
other than the last data element 


NEOD 


A noncoherent data identifier for the last data 
element 



Table 12.3 System Interface Requests 

Processor Request Protocols 

Processor request protocols described in this section include: 

• read 

• write 

Note: In the timing diagrams, the two closely spaced, wavy vertical 
lines (see SCycle 2 in Figure 12.20 on page 12-24) indicate one or more 
identical cycles. 
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Processor Read Request Protocol Steps 

The following sequence describes the protocol for a processor read 
request (the numbered steps below correspond to the numbers in 
Figure 12.12 on page 12-16). 

1. RdRdy* is asserted low, indicating the external agent is ready to 
accept a read request. 

2. With the system interface in master state, a processor read request is 
issued by driving a read command on the SysCmd bus and a read address 
on the SysAD bus. 

3. At the same time, the processor asserts ValidOut* for one cycle, 
indicating valid data is present on the SysCmd and the SysAD buses. 

Note: Only one processor read request can be pending at a time. 

4. The processor makes an uncompelled change to slave state at the 
issue cycle of the read request by asserting the Release* signal for one 
cycle. 

Note: The external agent must not assert the signal ExtRqst* for the 
purposes of returning a read response, but rather must wait for the 
uncompelled change to slave state. The signal ExtRqst* can be asserted 
before or during a read response to perform an external request other than 
a read response. 

5. The processor releases the SysCmd and the SysAD buses one SCycle 
after the assertion of Release*. 

6. The external agent drives the SysCmd and the SysAD buses within 
two cycles after the assertion of Release*. 

Once in slave state (starting at cycle 5 in Figure 12.12), the external 
agent can return the requested data through a read response. The read 
response can return the requested data or, if the requested data could not 
be successfully retrieved, an indication that the returned data is erro- 
neous. If the returned data is erroneous, the processor takes a bus error 
exception. 

Note: The R4600/R4700 only check the error bit for the first 
doubleword of read response data, all other error bits are ignored. 
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Figure 12.12 illustrates a processor read request, coupled with an 
uncompelled change to slave state. 

Note: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 
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Figure 12.12 Processor Read Request Protocol 

The assertion of Release* indicates either an uncompelled change to 
slave state, or a response to the assertion of ExtRqst*, whereupon the 
processor accepts either a read response, or any other external request. 
If any external request other than a read response is issued, the 
processor performs another uncompelled change to slave state after 
processing the external request. 

The actual read response, where the external agent returns the 
requested data, is shown later in this chapter. 

External Instruction Read Response Time 

The R4600/R4700 accesses the external bus due to instruction cache 
miss or an uncached reference. The length of time for an external read is 
based on the overhead at the beginning and end of the read along with the 
time to drive the address and get the response data. 
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Instruction Read Latency Steps for System Clock 

The read latency for a system clock In the divide-by-two mode is as 
follows: 

1 . The startup overhead is one to two pipeline cycles (PCycle) for the CPU 
to transfer the address to the pads to be output. The second PCycle is 
needed if the miss is detected on a PCycle not aligned with the rising edge 
ofSClock. 

2. The CPU drives the address on the SysAD bus for two PCycles. 

3. The CPU tri-states the SysAD bus for two PCycles. 

4. The CPU waits for the main memory to return the data. This is 
expressed as nx 2 PCycles. 

5. The first double word is driven in the SysAD from the main memory 
for two PCycles. 

6. The remaining three double words of instruction are driven on 
SysAD for 3*2 PCycles. 

Notes on the Instruction Read Latency Steps: 

a.For instruction misses the pipeline starts after all the instructions are 
returned. 

b.n is the total number of idle cycles (even between double word 
instruction). For zero wait state systems, n = 0. 

Example of Instruction Block Read With Zero Wait State 

The following example shows an instruction block read with a zero wait 
state: 
StepDescriptionPCycles 

1 . CPU overhead for cache miss detection: 1-2 

2. Address driven on SysAD bus: 2 

3. SysAD bus tri-stated:2 

4. Memory latency to return the data:0*2 

5. First double word driven on SysAD bus: 2 

6. Remaining three instructions returned:2*3=6 
Total PCycles: 13-14 

External Data Read Response Time 

The R4600/R4700 accesses the external bus due to data cache miss or 
an uncached reference. The length of time for an external read is based 
on the overhead at the beginning and end of the read along with the time 
to drive the address and get the response data. 
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Data Read Latency Steps for System Clock 

The read latency for a system clock in the divide-by-two mode is as 
follows: 

1. The startup overhead is one to two pipeline cycles (PCycle) for the 
CPU to generate the parity for the address to be output. The second PCycle 
is needed if the miss is detected or a PCycle not aligned with the rising edge 
of SClock. 

2. The CPU drives the address on the SysAD bus for two PCycles. 

3. The CPU tri-states the SysAD bus for two PCycles. 

4. The CPU waits for the main memory to return the data. This is 
expressed as nx 2 PCycles where n is the number of SClock cycles for the 
first data to be returned in a block read, or the latency for the single read. 
For zero wait state memory system n should be zero. 

5. The first double word is driven in the SysAD from the main memory 
for two PCycles. 

6. The end of the overhead is two PCycles: one to transfer the data from 
the pads and generate the parity, and one to write to the register (or cache, 
if it is cacheable data). 

Notes on the Data Read Latency Steps: 

a. If n=0 and the line being replaced is dirty, the CPU takes one to two 
additional PCycles of overhead to move the dirty data into the write 
buffer. 

b. The additional latency for returning the remaining three data 
elements should be added in a similar fashion. 

c. If cache line needs to be written back the read request is posted first, 
then the write is completed. 

Example of Data Single Read With Zero Wait State 

The following example shows a data block read with a zero wait state: 
StepDescriptionPCycles 

1 . CPU overhead for cache miss detection: 1-2 

2. Address driven on SysAD bus: 2 

3. SysAD bus tri-stated:2 

4. Memory latency to return the data:0*2 

5. First double word driven on SysAD bus:2 

6. CPU overhead to write the data cache, 
do the fixup, and then restart: 2 

Total PCycles:9-10 

External Cycles for Read Latency 

The external cycles to get the response data will look similar to Figure 
12.13. For a larger "divide by" it will take longer to get the response data. 
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Figure 12.13 Uncached Read— External Cycles 
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The same operation is shown in greater detail in Figure 12. 14. These 
figures assume the following: 

1. Data is returned immediately after the Release* is asserted, and after 
the bus turn-around cycle (when the CPU tri-states the bus to allow the 
external agent to drive it). 

2. The data meets the setup and hold requirements for the rising edge 
of the SClock that is identified in the preceding and following figures with 
an asterisk. 
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Figure 12.14 Processor Read Cycle 

Processor Write Request Protocol 

Processor write requests are issued using one of two protocols. 

• Doubleword, partial doubleword, word, or partial word writes use a 
word 1 write request protocol. 

• Block writes use a block write request protocol. 

Processor word write requests are issued with the system interface in 
master state, as described in the following steps. Figure 12.15 shows a 
processor noncoherent word write request cycle. 

1 . A processor single word write request is issued by driving a write 
command on the SysCmd bus and a write address on the SysAD bus. 

2. The processor asserts ValidOut*. 

3. The processor drives a data identifier on the SysCmd bus and data 
on the SysAD bus. 

4. The data identifier associated with the data cycle must contain a last 
data cycle indication. At the end of the cycle, ValidOut* is deasserted. 

Note: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 



1# Called word to distinguish it from block request protocol. Data transferred can 
actually be doubleword, partial doubleword, word, or partial word. 
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Figure 12.15 Processor Noncoherent Word Write Request Protocol 

The R4600/R4700 interface requires that WrRdy* be asserted two 
system cycles prior to the issue of a write, for one clock cycle. An external 
agent that deasserts WrRdy* immediately upon receiving the write that 
fills its buffer will stop a subsequent write for four system cycles in R4000 
non-block write compatible mode. This leaves two null system cycles after 
a write address/data pair to give the external agent time to stop the next 
write. This is illustrated in Figure 12.6 on page 12-7. 

An Address/data pair every four system cycles is not sufficiently high 
performance for all applications. For this reason, the R4600/R4700 
provides two new protocol options that modify the R4000 back-to-back 
write protocol to allow an address/ data pair every two system cycles. The 
first protocol, called write re-issue, allows WrRdy* to be deasserted during 
the address cycle and forces a write to be re-issued. The second, called 
pipelined writes, leaves the sample point of WrRdy* unchanged and 
requires that the external agent accept one more write than the R4000 
protocol. 
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The write re-issue protocol is shown in Figure 12.16. Writes issue when 
WrRdy* is asserted both two cycles prior to the address cycle and during 
the address cycle. 
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Figure 12. 16 Write re-issue 

The pipelined write protocol is shown in Figure 12.17. This protocol 
maintains the R4000 write issue rule (issue if WrRdy* asserted two cycles 
prior to the address cycle, for one clock cycle), but simply eliminates the 
two null cycles between writes. The external agent is then required to 
accept one more write after it deasserts WrRdy*. 
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Figure 12.17 Pipelined Writes 

All three write protocols apply for both single write and block writes. 
This means that in pipeline write, for example, a single write can be 
followed immediately by a block write that the external agent must 
accept. 

Processor block write requests are issued with the system interface in 
master state, as described below; a processor noncoherent block request 
for eight words of data is illustrated in Figure 12. 18 on page 12-22. 

1. The processor issues a write command on the SysCmd bus and a 
write address on the SysAD bus 

2. The processor asserts ValidOut*. 

3. The processor drives a data identifier on the SysCmd bus and data 
on the SysAD bus. 

4. The processor asserts ValidOut* for a number of cycles sufficient to 
transmit the block of data. 

5. The data identifier associated with the last data cycle must contain a 
last data cycle indication. 
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Figure 12.18 illustrate a processor noncoherent block request for eight 
words of data with a data pattern of DDDD. 
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Figure 12.18 Processor Noncoherent Block Write Request Protocol 

Processor Request and Flow Control 

The external agent uses RdRdy* to control the flow of processor read 
requests. Figure 12.19 on page 12-23 illustrates this flow control, as 
described in the steps below. 

1. The processor samples the signal RdRdy* to determine if the external 
agent is capable of accepting a read request. 

2. The signal WrRdy* controls the flow of a processor write request. 

3. The processor does not complete the issue of a read request, until it 
issues an address cycle in response to the request for which the signal 
RdRdy* was asserted two cycles earlier. 

4. The processor does not complete the issue of a write request until it 
issues an address cycle in response to the write request for which the 
signal WrRdy* was asserted two cycles earlier. 
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Figure 12.19 illustrates two processor write requests in which the issue 
of the second is delayed for the assertion of WrRdy*. 

Note: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 
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Figure 12. 19 Two Processor Write Requests, Second Write Delayed for the Assertion of 

WrRdy* 

External Request Protocols 

External requests can only be issued with the system interface in slave 
state. An external agent asserts ExtRqst* to arbitrate (see "External 
Arbitration Protocol" on page 12-24) for the system interface, then waits 
for the processor to release the system interface to slave state by 
asserting Release* before the external agent issues an external request. 
If the system interface is already in slave state — that is, the processor has 
previously performed an uncompelled change to slave state — the external 
agent can begin an external request immediately. 

After issuing an external request, the external agent must return the 
system interface to master state. If the external agent does not have any 
additional external requests to perform, ExtRqst* must be deasserted 
two cycles after the cycle in which Release* was asserted. For a string of 
external requests, the ExtRqst* signal is asserted until the last request 
cycle, whereupon it is deasserted two cycles after the cycle in which 
Release* was asserted. 

The processor continues to handle external requests as long as 
ExtRqst* is asserted; however, the processor cannot release the system 
interface to slave state for a subsequent external request until it has 
completed the current request. As long as ExtRqst* is asserted, the 
string of external requests is not interrupted by a processor request. 

This section describes the following external request protocols: 

• read 

• null 

• write 

• read response 
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External Arbitration Protocol 

System interface arbitration uses the signals ExtRqst* and Release* as 
described above. Figure 12.20 is a timing diagram of the arbitration 
protocol, in which slave and master states are shown. 

The arbitration cycle consists of the following steps: 

1. The external agent asserts ExtRqst* when it wishes to submit an 
external request. 

2. The processor waits until it is ready to handle an external request, 
whereupon it asserts Release* for one cycle. 

3. The processor sets the SysAD and SysCmd buses to tri-state. 

4. The external agent must begin driving the SysAD bus and the 
SysCmd bus two cycles after the assertion of Release*. 

5. The external agent deasserts ExtRqst* two cycles after the assertion 
of Release*, unless the external agent wishes to perform an additional 
external request. 

6. The external agent sets the SysAD and the SysCmd buses to tri-state 
at the completion of an external request. 

The processor can start issuing a processor request one cycle after the 
external agent sets the bus to tri-state. 

Note: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 




Figure 12.20 Arbitration Protocol for External Requests 

External Read Request Protocol 

External reads are requests for a word of data from a processor internal 
resource, such as a register. External read requests cannot be split; that 
is, no other request can occur between the external read request and its 
read response. 
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Figure 12.21 shows a timing diagram of an external read request, which 
consists of the following steps: 

1. An external agent asserts ExtRqst* to arbitrate for the system 
interface. 

2. The processor releases the system interface to slave state by asserting 
Release* for one cycle and then deasserting Release*. 

3. After Release* is deasserted, the SysAD and SysCmd buses are set 
to a tri-state for one cycle. 

4. The external agent drives a read request command on the SysCmd 
bus and a read request address on the SysAD bus and asserts Validln* for 
one cycle. 

5. After the address and command are sent, the external agent releases 
the SysCmd and SysAD buses by setting them to tri-state and allowing the 
processor to drive them. The processor, having accessed the data that is 
the target of the read, returns this data to the external agent. The 
processor accomplishes this by driving a data identifier on the SysCmd 
bus, the response data on the SysAD bus, and asserting ValidOut* for one 
cycle. The data identifier indicates that this is last-data-cycle response 
data. 

6. The system interface is in master state. The processor continues 
driving the SysCmd and SysAD buses after the read response is returned. 

Note: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 

External read requests are only allowed to read a word of data from the 
processor. The processor response to external read requests for any data 
element other than a word is undefined. 
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Note: The processor does not contain any resources that are readable by an external read 
request; in response to an external read request the processor returns undefined data and 
a data identifier with its Erroneous Data bit, SysCmd (5), set. 



Figure 12.21 External Read Request, System Interface in Master State 

External Null Request Protocol 

The R4600/R4700 only supports one external null request. A system 
interface release external null request returns the system interface to 
master state from slave state without otherwise affecting the processor. 
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External null requests require no action from the processor other than 
to return the system interface to master state. 

Figure 12.22 show timing diagram of the external null request cycle, 
which consist of the following steps: 

1. The external agent asserts ExtRqst* to arbitrate for the system 
interface. 

2. The processor releases the system interface to slave state by asserting 
Release 4 '. 

3. The external agent drives a system interface release external null 
request command on the SysCmd bus, and asserts Validln* for one cycle 
to return the system interface back to master state. 

4. The SysAD bus is unused (does not contain valid data) during the 
address cycle associated with an external null request. 

5. After the address cycle is issued, the null request is complete. 

For a system interface release external null request the external agent 
releases the SysCmd and SysAD buses, and expects the system interface 
to return to master state. 



SCycle 
SCIock 
SysAD Bus 
SysCmd Bus 
ValidOut* 
Validln* 
ExtRqst* 
Release* 



Slave - 

2 I 3 I 4 I 5 I 6 



Master - 



10 11 12 




Ztei) — C 



a 



^w 



Figure 12.22 System Interface Release External Null Request 

External Write Request Protocol 

External write requests use a protocol identical to the processor single 
word write protocol except the Validln* signal is asserted instead of 
ValidOut*. Figure 12.23 on page 12-27 shows a timing diagram of an 
external write request, which consists of the following steps: 

1. The external agent asserts ExtRqst* to arbitrate for the system 
interface. 

2. The processor releases the system interface to slave state by asserting 
Release*. 

3. The external agent drives a write command on the SysCmd bus, a 
write address on the SysAD bus, and asserts Validln*. 

4. The external agent drives a data identifier on the SysCmd bus, data 
on the SysAD bus, and asserts Validln*. 

5. The data identifier associated with the data cycle must contain a 
coherent or noncoherent last data cycle indication. 

6. After the data cycle is issued, the write request is complete and the 
external agent sets the SysCmd and SysAD buses to a tri-state, allowing 
the system interface to return to master state. Timings for the SysADC 
and SysCmdP buses are the same as those of the SysAD and SysCmd 
buses, respectively. 
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External write requests are only allowed to write a word of data to the 
processor. Processor behavior in response to an external write request for 
any data element other than a word is undefined. 
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Figure 12.23 External Write Request* with System Interface initially Master State 

Read Response Protocol 

An external agent must return data to the processor in response to a 
processor read request by using a read response protocol. A read 
response protocol consists of the following steps: 

1. The external agent waits for the processor to perform an uncompelled 
change to slave state. 

2. The external agent returns the data through a single data cycle or a 
series of data cycles. 

3. After the last data cycle is issued, the read response is complete and 
the external agent sets the SysCmd and SysAD buses to a tri-state. 

4. The system interface returns to master state. 

Note: The processor always performs an uncompelled change to slave 
state in the same cycle that it issues a read request. 

5. The data identifier for data cycles must indicate the fact that this data 
is response data. 

6. The data identifier associated with the last data cycle must contain a 
fast data cycle indication. 

For read responses to non-coherent block read requests (which is the 
only read request for normal operations of the R4600/R4700,) the 
response data will not need to identify an initial cache state. The cache 
state will automatically be assigned as dirty exclusive by the R4600/ 
R4700. 

The data identifier associated with a data cycle can indicate that the 
data transmitted during that cycle is erroneous; however, an external 
agent must return a data block of the correct size regardless of the fact 
that the data may be in error. The R4600/R4700 only checks the error bit 
for the first doubleword of a block, the other error bits for the block of 
data are ignored If an initial erroneous data cycle is detected, the 
processor takes a bus error at the completion of the data transfer. 
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Read response data must only be delivered to the processor when a 
processor read request is pending. The behavior of the processor is unde- 
fined when a read response is presented to it and there is no processor 
read pending. 

Figure 12.24 illustrates a processor word read request followed by a 
word read response. Figure 12.25 illustrates a read response for a 
processor block read with the system interface already in slave state. 
Figure 12.26 illustrates a block read transaction with one wait state. 

Note: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 
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Figure 12.24 Processor Word Read Request, followed by a Word Read Response 
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Figure 12.25 Block Read Response With Zero Wait State 



12-28 



System Interface 



Chapter 12 





Master 




1 M 


Slave 




Master 


SCycle 1 1 
SCIock | \_ 


| 2 | 3 | 


4 


| 5 | 6 


| 7 | 8 

(Datal ) 

-(NData) 


| 9 | 10 | 11 | 12 

( Data2 ) (Data3 ) 

-( NData) (NEOD) 


I 13 I 


SysAD Bus | 
SysCmd Bus | 
ValidOut* | 


X Addr }- 
X Read )- 


-( 




\ uatao / 
-(NData) 




-( 






\ / 


\ / 






Validln* | 






"V^"T_T" 




ExtRqst* | 


Release* | 
RdRdy* |~\ 


\_/ 












i 















Figure 12.26 Block Read Transaction With One Wait State 

Data Rate Control 

The system interface supports a maximum data rate of one doubleword 
per cycle. The data rate the processor can support is directly related to 
the rate at which the external agent can accept data. 

Read Data Pattern 

The rate at which data is delivered to the processor can be determined 
by the external agent — for example, the external agent can drive data and 
assert Validln* every n cycles, instead of every cycle. An external agent 
can deliver data at any rate it chooses, but must not deliver data to the 
processor any faster than the processor is capable of receiving it. 

The processor only accepts cycles as valid when Validln* is asserted 
and the SysCmd bus contains a data identifier. If the external agent 
sends more data items then requested (e.g., a fifth doubleword of read 
response data with Validln* asserted) or the last data (i.e., the fourth 
doubleword) of a block read is not tagged as the last data item, it is an 
error and the resulting actions of the processor for these cases will be 
undefined. 
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Figure 12.27 shows a read response with reduced data rate and with 
the system interface in slave state. 
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Figure 12.27 Read Response, Reduced Data Rate, System Interface in Slave State 

Write Data Transfer Patterns 

The write data pattern specifies the pattern the R4600/R4700 uses 
when writing a block to the external agent. This pattern is specified 
through the mode bits. 

A data pattern is a sequence of letters indicating the data and unused 
cycles that repeat to provide the appropriate data rate. For example, the 
data pattern DDxx specifies a repeatable data rate of two doublewords 
eveiy four cycles, with the last two cycles unused. 

Table 12.4 lists the maximum processor data rate and the data pattern 
for each data rate. 



Maximum Data Transmit Rate Block writes 


Data Pattern 


1 Double/1 SClock Cycle 


DDDD 


2 Doubles/3 SClock Cycles 


DDxDDx 


1 Double/2 SClock Cycles 


DDxxDDxx 


1 Double/2 SClock Cycles 


DxDxDxDx 


2 Doubles /5 SClock Cycles 


DDxxxDDxxx 


1 Double/3 SClock Cycles 


DDxxxxDDxxxx 


1 Double/3 SClock Cycles 


DxxDxxDxxDxx 


1 Double/4 SClock Cycles 


DDxxxxxxDDxxxxxx 


1 Double/4 SClock Cycles 


DxxxDxxxDxxxDxxx 



Table 12.4 Transmit Data Rates and Patterns 

In Table 12.4 data patterns are specified using the letters D and x; D 
indicates a data cycle and x indicates an unused cycle. During the 
unused cycles, the data bus will maintain the last data value (D). 
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Independent Transmissions on the SysAD Bus 

In most applications, the SysAD bus is a point-to-point connection, 
running from the processor to a bidirectional registered transceiver 
residing in an external agent. For these applications, the SysAD bus has 
only two possible drivers, the processor or the external agent. 

Certain applications may require connection of additional drivers and 
receivers to the SysAD bus, to allow transmissions over the SysAD bus 
that the processor is not involved in. These are called independent trans- 
missions. To effect an independent transmission, the external agent must 
coordinate control of the SysAD bus by using arbitration handshake 
signals and external null requests. 

An independent transmission on the SysAD bus follows this procedure: 

1. The external agent requests mastership of the SysAD bus, to issue an 
external request. 

2. The processor releases the system interface to slave state. 

3. The external agent then allows the independent transmission to take 
place on the SysAD bus, making sure that Validln* is not asserted while 
the transmission is occurring. 

4. When the transmission is complete, the external agent must issue a 
system interface release external nuVL request to return the system interface 
to master state. 

System Interface Endianness 

The endianness of the system interface is programmed at boot time 
through the boot-time mode control interface (see chapter 9, Initialization 
Interface), and remains fixed until the next time the processor boot- time 
mode bits are read. Software cannot change the endianness of the system 
interface and the external system; software can set the reverse endian bit 
to reverse the interpretation of endianness inside the processor, but the 
endianness of the system interface remains unchanged. 

System Interface Cycle Time 

The processor specifies minimum and maximum cycle counts for 
various processor transactions and for the processor response time to 
external requests. Processor requests themselves are constrained by the 
system interface request protocol, and request cycle counts can be deter- 
mined by examining the protocol. The following system interface interac- 
tions can vary within minimum and maximum cycle counts: 

• waiting period for the processor to release the system interface to 
slave state in response to an external request (release latency) 

• response time for an external request that requires a response (exter- 
nal response latency). 

The remainder of this section describes and tabulates the minimum and 
maximum cycle counts for these system interface interactions. 
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Release Latency 

Release latency is generally defined as the number of cycles the 
processor can wait to release the system interface to slave state for an 
external request. When no processor requests are in progress, internal 
activity can cause the processor to wait some number of cycles before 
releasing the system interface. Release latency is therefore more specifi- 
cally defined as the number of cycles that occur between the assertion of 
ExtRqst* and the assertion of Release*. 

There are three categories of release latency: 

• Category 1: when the external request signal is asserted two cycles 
before the last cycle of a processor request. 

• Category 2: when the external request signal is not asserted during a 
processor request, or is asserted during the last cycle of a processor 
request. 

• Category 3: when the processor makes an uncompelled change to 
slave state. 

Table 12.5 summarizes the minimum and maximum release latencies 
for requests that fall into categories 1, 2 and 3. Note that the maximum 
and minimum cycle count values are subject to change. 



Category 


Minimum PCycles 


Maximum PCycles 


1 


4 


6 


2 


4 


24 


3 









Table 12.5 Release Latency for External Requests 

The differences in the minimum and maximum times are due to 
internal conditions not readily observable externally. 

System Interface Commands and Data Identifiers 

System interface commands specify the nature and attributes of any 
system interface request; this specification is made during the address 
cycle for the request. System interface data identifiers specify the 
attributes of data transmitted during a system interface data cycle. 

The following sections describe the syntax, that is, the bitwise encoding 
of system interface commands and data identifiers. 

Reserved bits and reserved fields in the command or data identifier 
should be set to 1 for system interface commands and data identifiers 
associated with external requests. For system interface commands and 
data identifiers associated with processor requests, reserved bits and 
reserved fields in the command and data identifier are undefined. 

Command and Data Identifier Syntax 

System interface commands and data identifiers are encoded in 9 bits 
and are transmitted on the SysCmd bus from the processor to an 
external agent, or from an external agent to the processor, during address 
and data cycles. Bit 8 (the most-significant bit) of the SysCmd bus deter- 
mines whether the current content of the SysCmd bus is a command or a 
data identifier and, therefore, whether the current cycle is an address 
cycle or a data cycle. For system interface commands, SysCmd(8) must 
be set to 0. For system interface data identifiers, SysCmd(8) must be set 
to 1. 
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System Interface Command Syntax 

This section describes the SysCmd bus encoding for system interface 
commands. Figure 12.28 shows a common encoding used for all system 
interface commands. 
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7 




5 


4 












Request Type 


~~ 1 



















Figure 12.28 System Interlace Command Syntax Bit Definition 

SysCmd(8) must be set to for all system interface commands. 

SysCmd(7:5) specify the system interface request type which may be 
read, write or null; Table 12.6 lists the encoding of SysCmd(7:5) 

Table 12.6 shows the types of requests encoded by the SysCmd(7:5) 
bits. 



SysCmd(7:5) 


Command 





Read Request 


1 


Reserved 


2 


Write Request 


3 


Null Request 


4-7 


Reserved 



Table 12.6 Encoding of SysCmd(7:5) for System Interface Commands 

SysCmd(4:0) are specific to each type of request and are defined in 
each of the following sections. 

Read Requests 

Figure 12.29 shows the format of a SysCmd read request. 
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Read Request Specific 
(see tables) 



] 



Figure 12.29 Read Request SysCmd Bus Bit Definition 
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Table 12.7, Table 12.8, and Table 12.9 list the encoding of SysCmd(4:0) 
for read requests. 



SysCmd(4:3) 


Read Attributes 


0- 1 


Reserved 


2 


Noncoherent block read 


3 


Doubleword, partial doubleword, word, or partial word 



Table 12.7 Encoding of SysCmd(4:3) for Read Requests 



SysCmd(2) 


Link Address Retained Indication 





Link address not retained 


1 


Link address retained 


SysCmd(l:0) 


Read Block Size 





Reserved 


1 


8 words 


2-3 


Reserved 



Table 12.8 Encoding of SysCmd(2:0) for Block Read Request 



SysCmd(2:0) 


Read Data Size 





1 byte valid (Byte) 


1 


2 bytes valid (Halfword) 


2 


3 bytes valid (Tribyte) 


3 


4 bytes valid (Word) 


4 


5 bytes valid (Quintibyte) 


5 


6 bytes valid (Sextibyte) 


6 


7 bytes valid (Septibyte) 


7 


8 bytes valid (Doubleword) 



Table 12.9 Doubleword, Word, or Partial-word Read Request Data Size 
Encoding of SysCmd(2:0) 

Write Requests 

Figure 12.30 shows the format of a SysCmd write request. 



2 1 



010 



Write Request Specific 
(see tables) 

i i 






Figure 12.30 Write Request SysCmd Bus Bit Definition 
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Table 12.10 lists the write attributes encoded in bits SysCmd(4:3) 
Table 12.11 lists the block write replacement attributes encoded in bits 
SysCmd(2:0). Table 12.12 lists the write request bit encoding in 
SysCmd(2:0) 



SysCmd(4:3) 


Write Attributes 





Reserved 


1 


Reserved 


2 


Block write 


3 


Doubleword, partial doubleword, word, or partial word 



Table 12.10 Write Request Encoding of SysCmd(4:3) 



SysCmd(2) 



SysCmd(l:0) 



2-3 



Cache Line Replacement Attributes 



Cache line replaced 



Cache line retained 



Write Block Size 



Reserved 



8 words 



Reserved 



Table 12.11 Block Write Request Encoding of SysCmd(2:0) 



SysCmd(2:0) 



Write Data Size 



1 byte valid (Byte) 



2 bytes valid (Halfword) 



3 bytes valid fTribyte) 



4 bytes valid (Word) 



5 bytes valid (Quintibyte) 



6 bytes valid (Sextiby te) 



7 bytes valid (Septibyte) 



8 bytes valid (Doubleword) 



Table 12. 12 Doubleword, Word, or Partial-word Write Request Data Size 
Encoding of SysCmd(2:0) 
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Null Requests 

Figure 12.31 shows the format of a SysCmd null request. 





8 


7 




5 


4 3 2 10 







011 


Null Request Specific 1 

(see table) 1 

1 1 p 















Figure 12.31 Null Request SysCmd Bus Bit Definition 

System interface release external null requests use the null request 
command. Table 12.13 lists the encoding of SysCmd(4:3) for external 
null requests. SysCmd(2:0) are reserved for both instances of null 
requests. 



SysCmd(4:3) 


Null Attributes 





System Interface release 


1 -3 


Reserved 



Table 12.13 External Null Request Encoding of SysCmd(4:3) 

System Interface Data Identifier Syntax 

This section defines the encoding of the SysCmd bus for system inter- 
face data identifiers. Figure 12.32 shows a common encoding scheme 
used for all system interface data identifiers. 



] 



Last 
Data 



Resp 
Data 



Good 
Data 



Data 
Check 



Reserved 



Figure 12.32 Data Identifier SysCmd Bus Bit Definition 

SysCmd(8) must be set to 1 for all system interface data identifiers, 
system interface data identifiers use the format for noncoherent data. 

Noncoherent Data 

Noncoherent data is defined as follows: 

• data that is associated with processor block write requests and pro- 
cessor doubleword, partial doubleword, word, or partial word write re- 
quests 

• data that is returned in response to a processor noncoherent block 
read request or a processor doubleword, partial doubleword, word, or 
partial word read request 

• data that is associated with external write requests 

• data that is returned in response to an external read request 
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Data Identifier Bit Definitions 

SysCmd(7) marks the last data element and SysCmd(6) Indicates 
whether or not the data is response data, for both processor and external 
coherent and noncoherent data identifiers. Response data is data 
returned in response to a read request. 

SysCmd(5) indicates whether or not the data element is error free. Erro- 
neous data contains an uncorrectable error and is returned to the 
processor, forcing a bus error. The processor delivers data with the good 
data bit deasserted if a primaiy parity error is detected for a transmitted 
data item. 

SysCmd(4) indicates to the processor whether to check the data and 
check bits for this data element. 

SysCmd(3) is reserved for external data identifiers. 

SysCmd(4:3) are reserved for noncoherent processor data identifiers. 

SysCmd(2:0) are reserved for noncoherent data identifiers. 

Table 12. 14 lists the encoding of SysCmd(7:3) for processor data identi- 
fiers. 



SysCmd(7) 


Last Data Element Indication 





Last data element 


1 


Not the last data element 


SysCmd(6) 


Response Data Indication 





Data is response data 


1 


Data is not response data 


SysCmd(5) 


Good Data Indication 





Data is error free 


1 


Data is erroneous 


SysCmd(4:3) 


Reserved 



Table 12.14 Processor Data Identifier Encoding of SysCmd(7:3) 
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Table 12.15 lists the encoding of SysCmd(7:3) for external data identi- 
fiers. 



SysCmd(7) 


Last Data Element Indication 





Last data element 


1 


Not the last data element 


SysCmd(6) 


Response Data Indication 





Data is response data 


1 


Data is not response data 


SysCmd(5j 


Good Data Indication 





Data is error free 


1 


Data is erroneous 


SysCmd(4) 


Data Checking Enable 





Check the data and check bits 


1 


Do not check the data and check bits 


SysCmd(3) 


Reserved 



Table 12.15 External Data Identifier Encoding of SysCmd(7: 3) 

System Interface Addresses 

System interface addresses are full 36-bit physical addresses presented 
on the least-significant 36 bits (bits 35 through 0) of the SysAD bus 
during address cycles; the remaining bits of the SysAD bus are unused 
during address cycles. 

Addressing Conventions 

Addresses associated with doubleword, partial doubleword, word, or 
partial word transactions, are aligned for the size of the data element. 
The system uses the following address conventions: 

• Addresses associated with block requests are aligned to double-word 
boundaries; that is, the low-order 3 bits of address are 0. 

• Doubleword requests set the low-order 3 bits of address to 0. 

• Word requests set the low-order 2 bits of address to 0. 

• Halfword requests set the low-order bit of address to 0. 

• Byte, tribyte, quintibyte, sextibyte, and septibyte requests use the 
byte address. 

Subblock Ordering 

The order in which data is returned in response to a processor block 
read request is subblock ordering. In subblock ordering, the processor 
delivers the address of the requested doubleword within the block. An 
external agent must return the block of data using subblock ordering, 
starting with the addressed doubleword. 

A block of data elements (whether bytes, halfwords, words, or double- 
words) can be retrieved from storage in two ways: in sequential order, or 
using a subblock order. This section describes these retrieval methods, 
with an emphasis on subblock ordering. Note that the R4600/R4700 only 
uses subblock ordering for block reads. 
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Example of Sequential Ordering 

Sequential ordering retrieves the data elements of a block in serial, or 
sequential, order. 

Figure 12.33 shows a sequential order in which DWO is taken first and 
DW3 is taken last. 



DWO 


DW1 


OW2 


&m 



DWO 
taken first 




DW1 
taken second 



DW3 
taken fourth 



DW2 
taken third 



Figure 12.33 Retrieving a Data Block in Sequential Order 

Example of Subblock Ordering 

Subblock ordering allows the system to define the order in which the 
data elements are retrieved. The smallest data element of a block transfer 
for the R4600/R4700 is a doubleword, and Figure 12.34 shows the 
retrieval of a block of data that consists of 4 doublewords (the cache line 
size is 8 words), in which DW2 is taken first. 



octalword 



Order of retrieval 



quadword 



Wo ., 


:&W1 : 


■W&-: 


SW3; 



DWO 
taken third 




DW1 
taken fourth 



DW3 
taken second 



DW2 
taken first 



Figure 12.34 Retrieving Data in a Subblock Order 

Using the subblock ordering shown in Figure 12.34, the doubleword at 
the target address is retrieved first (DW2), followed by the remaining 
doubleword (DW3) in this quadword. Next, the quadword that fills out the 
octalword are retrieved in the same order as the prior quadword (in this 
case DWO is followed by DW1). 
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It may be easier way to understand subblock ordering by taking a look 
at the method used for generating the address of each doubleword as it is 
retrieved. The subblock ordering logic generates this address by 
executing a bit-wise exclusive-OR (XOR) of the starting block address with 
the output of a binary counter that increments with each doubleword, 
starting at doubleword zero (00 2 ). 

Using this scheme, Table 12.16, Table 12.17, and Table 12.18 list the 
subblock ordering of doublewords for an 8-word block, based on three 
different starting-block addresses: 10 2 , ll2» and 01 2 . The subblock 
ordering is generated by an XOR of the subblock address (either 10 2 , 1 1 2 » 
or 01 2 ) with the binary count of the doubleword (00 2 through 1 1 2 ). Thus, 
the third doubleword retrieved from a block of data with a starting 
address of 10 2 is found by taking the XOR of address 10 2 with the binary 
count of DW2, 10 2 . The result is 00 2 , or DWO (shown in Table 12. 16). 



Cycle 


Starting Block 
Address 


Binary Count 


Double Word 
Retrieved 


1 


10 


00 


10 


2 


10 


01 


11 


3 


10 


10 


00 


4 


10 


11 


01 



Table 12. 16 Sequence of Doublewords Transferred Using Subblock 
Ordering: Address 10 2 



Cycle 


Starting Block 
Address 


Binary Count 


Double Word 
Retrieved 


1 


11 


00 


11 


2 


11 


01 


10 


3 


11 


10 


01 


4 


11 


11 


00 



Table 12.17 Sequence of Doublewords Transferred Using Subblock 
Ordering: Address 11 2 



Cycle 


Starting Block 
Address 


Binary Count 


Double Word 
Retrieved 


1 


01 


00 


01 


2 


01 


01 


00 


3 


01 


10 


11 


4 


01 


11 


10 



Table 12. 18 Sequence of Doublewords Transferred Using Subblock 
Ordering: Address 01 2 



For block write requests, the processor always delivers the address of 
the doubleword at the beginning of the block; the processor delivers data 
beginning with the doubleword at the beginning of the block and 
progresses sequentially through the doublewords that form the block. 
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During data cycles, the valid byte lines depend upon the position of the 
data with respect to the aligned doubleword (this may be a byte, halfword, 
tribyte, quadbyte/word, quintibyte, sextibyte, septibyte, or an octalbyte/ 
doubleword). For example, in little-endian mode, on a byte request where 
the address modulo 8 is 0, SysAD(7:0) are valid during the data cycles. 

Table 12.19 shows the byte lanes used for partial word transfers for 
both little and big endian. 



# Bytes 
SysCmd(2:0) 


Address 
Mod 8 


63:56 


Sys 
55:48 


iAD byi 

47:40 


ke lanes 
39:32 


used (t 
31:24 


rig endi 
23:16 


tan) 
15:8 


7:0 


1 

(000) 





• 
















1 




• 














2 






• 












3 








• 










4 










• 








5 












• 






6 














• 




7 
















• 


2 
(001) 





• 


• 














2 






• 


• 










4 










• 


• 






6 














• 


• 


3 
(010) 





• 


• 


• 












1 




• 


• 


• 










4 










• 


• 


• 




5 












• 


• 


• 


4 
(011) 





• 


• 


• 


• 










4 














• 


• 


• 


5 
(100) 





• 


• 


• 
















3 
















• 


• 


• 


6 
(101) 





• 


• 


• 










• 






2 






• 










• 


• 


• 


7 
(110) 





• 


• 


• 










• 


• 




1 




• 


• 










• 


• 


• 


8 (111) 





• 


• 


• 










• 


• 


• 




7:0 


15:8 

Sysi 


23:16 
\D byte 


31:24 
i lanes \ 


39:32 
used (li 


47:40 
ttle end 


55:48 
ian) 


63:56 



Table 12.19 Partial Word Transfer Byte Lane Usage 
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Processor Internal Address Map 

External reads and writes provide access to processor internal 
resources that may be of interest to an external agent. The processor 
decodes bits SysAD(6:0) of the address associated with an external read 
or write request to determine which processor internal resource is the 
target. 

However, the R4600/R4700 does not contain any resources that are 
readable through an external read request. Therefore, in response to an 
external read request the processor returns undefined data and a data 
identifier with its Erroneous Data bit, SysCmd(5), set. 

The Interrupt register is the only processor internal resource available 
for write access by an external request. The Interrupt register is accessed 
by an external write request with an address of 000 2 on bits 6:4 of the 
SysAD bus. 

The interrupt register is described in detail in Chapter 13, 
"R4600/R4700 Processor Interrupts." 
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Introduction 

The R4600/R4700 processor supports the following Interrupts: six 
hardware interrupts, one internal "timer interrupt, " two software 
interrupts, and one nonmaskable interrupt. The processor takes an 
exception on any interrupt. 

This chapter describes the six hardware and single nonmaskable 
interrupts. A description of the software and the timer interrupts can be 
found in Chapter 5. CPU exception processing is also described in Chapter 
5. Floating-point exception processing is described in Chapter 6. 

Hardware Interrupts 

The six CPU hardware interrupts can be caused by external write 
requests to the R4600/R4700, or can be caused through dedicated 
interrupt pins. These pins are latched into an internal register by the rising 
edge ofSClock. 

Nonmaskable Interrupt (NMI) 

The nonmaskable interrupt is caused either by an external write request 
to the R4600/R4700 or by a dedicated pin in the R4600/R4700. This pin 
is latched into an internal register by the rising edge ofSClock. 

Asserting Interrupts 

External writes to the CPU are directed to various internal resources, 
based on an internal address map of the processor. When SysAD[6:0] = 
during an ADDR cycle of external write request, an external write to any 
address writes to an architecturally transparent register called the 
Interrupt register; this register is available for external write cycles, but not 
for external reads. 

During a data cycle, SysAD[22:16] are the write enables for the seven 
individual Interrupt register bits (0 = disabled, 1 = enabled) and SysAD[6:0] 
are the values to be written into these bits (0 = no interrupt, 1 = interrupt). 
This allows any subset of the Interrupt register to be set or cleared with a 
single write request. Figure 13.1 shows the mechanics of an external write 
to the Interrupt register. 



SysAD(6:0) Interrupt Value 


f 



1 
2 
3 
4 

5 
6 


Interrupt register 

See Figure 13.2 
^ and Figure 13.3. 


6 


5 


4 3 2 


1 


Y 


■{>■ 


tA>\> 


Tl * 


2 


!2 21 


20 19 11 


3 17 16 


SysAD(22:16) Write Enables 



Figure 13.1 Interrupt Register Bits and Enables 
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Figure 13,2 shows how the R4600/R4700 interrupts are readable 
through the Cause register. The interrupt bits, Int*(5:0), are latched into 
the internal register by the rising edge of SClock. 

• Bit 5 of the Interrupt register in the R4600/R4700 is ORed with the 
Int*(5) pin and then multiplexed with the internal Timerlnterrupt 
signal. This result is directly readable as bit 15 of the Cause register. 

• Bits 4:0 of the Interrupt register are bit-wise ORed with the current 
value of the interrupt pins Int*[4:0] and the result is directly readable 
as bits 14: 10 of the Cause register. 



5 4 3 2 I 1 Interrupt register (5:0) 

i> 



&■ 



$> 



& 



Timer 
Interrupt 



3> 



SCIock- 



IP2 



IP3 



IP4 



IP5 



IP6 



IP7 



See 
Figure 13.4 



Cause 
register 



(Internal OR gate J) 



[71 1 4 I 3 I 2 I 1 I I (,ntemal ° Rgate V 
lnt*(5) Tlnt*(3) T lnt*(1) I 



*(3) I lnt*(1)l 
lnt*(4) lnt*(2) lnt*(0) 



Figure 13.2 R4600/R4700 Interrupt Signals 

Figure 13.3 shows the internal derivation of the NMI signal, for the 
R4600/R4700 processor. 

The NMI* pin is latched into an internal register by the rising edge of 
SClock. Bit 6 of the Interrupt register is then ORed with the inverted value 
of NMI* to form the nonmaskable interrupt. Only the one falling edge of the 
latched signal will cause the NMI. 



(Internal 
register) 



NMI 



interrupt register (6) 



SClock 



(Internal) 



NMI 

— ► 



Inverter OR gate 



Figure 13.3 R4600/R4700 Nonmaskable Interrupt Signal 
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Figure 13.4 shows the masking of the R4600/R4700 interrupt signal. 

• Cause register bits 15:8 (IP7-IP0) are AND-ORed with Status register 
interrupt mask bits 15:8 (IM7-IM0) to mask individual interrupts. 

• Status register bit is a global Interrupt Enable (IE). It is ANDed with 
the output of the AND-OR logic to produce the R4600/R4700 inter- 
rupt signal. 



Status register SR(0) 



a 



Status register SR(15:8) 



IMO 
THT 



TM? 
MS 

1M5 



1MB 
M 



IPO 

TPT 

HP? 

IPS 

\m 

W5 



TP6 



TPT 



Cause register (15:8) 



4^ 



AND-OR 
function 



¥ 






¥ 



R4600/R4700 
Interrupt 



AND 
function 



Figure 13.4 Masking of the R4600/R4700 Interrupts 
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Introduction 

This chapter describes the Error Checking mechanism used in the 
R4600/R4700 processor. 

Error Checking in the Processor 

Error checking codes allow the processor to detect and sometimes 
correct errors made when moving data from one place to another. 
Two major types of data errors can occur in data transmission: 

• hard errors, which are permanent, arise from broken interconnects, 
internal shorts, or open leads 

• soft errors, which are transient, are caused by system noise, power 
surges, and alpha particles. 

Hard errors must be corrected by physical repair of the damaged 
equipment and restoration of data from backup. Soft errors can be 
corrected by using error checking and correcting codes. 

Types of Error Checking 

The R4600/R4700 uses parity (error detection only). 

Parity Error Detection 

Parity is the simplest error detection scheme. By appending a bit to the 
end of an item of data— called a parity bit— single bit errors can be 
detected; however, these errors cannot be corrected. 

There are two types of parity: 

• Odd Parity adds 1 to any even number of Is in the data, making the 
total number of Is odd (including the parity bit). 

• Even Parity adds 1 to any odd number of Is in the data, making the 
total number of Is even (including the parity bit). 

Odd and even parity are shown in the example below: 

Data(3:0) Odd Parity Bit Even Parity Bit 

10 1 

The example above shows a single bit in Data(3:0) with a value of 1; this 
bit is Data(l). 

• In even parity, the parity bit is set to 1. This makes 2 (an even num- 
ber) the total number of bits with a value of 1. 

• Odd parity makes the parity bit a to keep the total number of 1 -val- 
ue bits an odd number — in the case shown above, the single bit Da- 
tad). 

The example below shows odd and even parity bits for various data 
values: 



Data(3:0) 


Odd Parity Bit 


Even Parity Bit 


110 


1 








1 





1111 


1 





110 1 





1 



Parity allows single-bit error detection, but it does not indicate which bit 
is in error— for example, suppose an odd-parity value of 00011 arrives. 
The last bit is the parity bit, and since odd parity demands an odd number 
(1,3,5) of Is, this data is in error: it has an even number of Is. However it 
is impossible to tell which bit is in error. 
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Error Checking Operation 

The processor verifies data correctness by using parity as it passes data 
from/to the system interface to/from the primaiy caches. 

System Interface 

The processor generates correct check bits for doubleword, word, or 
partial-word data transmitted to the system interface. As it checks for data 
correctness, the processor passes data check bits from the primary cache, 
directly without changing the bits, to the system interface. 

The processor does not check data received from the system interface for 
external writes. By setting the NChckbit in the data identifier, it is possible 
to prevent the processor from checking read response data from the 
system interface. 

For cache refill, if the NChck bit is set, the CPU will generally correct 
parity before placing data into the cache. The R4600/R4700 only checks 
parity for the first double word returned on a block instruction fetch, that 
is, for the double word that contains the instruction that was missed on 
in the cache. This double word is checked just as if it had been read out 
of the ICache. This parity check is done as a byte parity check. For single 
read, and with the NChck bit set, the CPU will check parity for all 64-bit, 
even if the transfer size is less than that. 

When the R4600/R4700 is checking parity it does not actually 
regenerate the word parity, but rather turns the byte parity supplied by the 
system into word parity. It XORS the bits in groups of four. As a result, if 
bad byte parity is supplied by the system, bad word parity will get written 
into the cache. This is done to be consistent with what happens in the 
DCache. 

The processor does not check addresses received from the system 
interface and does not generate correct check bits for addresses 
transmitted to the system interface. 

The processor does not contain a data corrector; instead, the processor 
takes a cache error exception when it detects an error based on data check 
bits. Software is responsible for error handling. 

System Interface Command Bus 

In the R4600/R4700 processor, the system interface command bus has 
no parity. SysCmdP always drives zero out for CPU valid cycles and is not 
checked when the system interface is in slave state. 



14-2 



R4600/R4700 Error Checking 



Chapter 14 



Summary of Error Checking Operations 

Error Checking operations are summarized in Table 14.1 and 
Table 14.2. 



Bus 


Uncached 
Load 


Uncached 
Store 


Primary Cache 
Load from System 
Interface 


Primary Cache 
Write to System 
Interface 


Cache 
Instruction 


Processor Data 


From System 
Interface 


Not 
Checked 


From System Inter- 
face unchanged 


Checked; Trap 
on Error 


Check on 
cache write- 
back; Trap on 
Error 


System Interface 
Address /Com- 
mand and Check 
Bits: Transmit 


Not 
Generated 


Not 
Generated 


Not Generated 


Not Generated 


Not Generated 


System Interface 
Address /Com- 
mand and Check 
Bits: Receive 


Not Checked 


NA 


Not Checked 


NA 


NA 


System Interface 
Data 


Checked; 
Trap on Error 


From Pro- 
cessor 


Checked; Trap on 
Error 


From Primary 
Cache 


From Primary 
Cache 


System Interface 
Data Check Bits 


Checked; 
Trap on Error 


Generated 


Checked; Trap on 
Error 


From Primary 
Cache 


From Primary 
Cache 



Table 14.1 Error Checking and Correcting Summary for Internal Transactions 



Bus 


Read 
Request 


Write Request 


Processor Data 


NA 


NA 


System Interface Address, Command, and Check Bits: Trans- 
mit 


Generated 


NA 


System Interface Address, Command, and Check Bits: Receive 


Not Checked 


Not Checked 


System Interface Data 


From Processor 


Checked; Trap on Error 


System Interface Data Check Bits 


Generated 


Checked; Trap on Error 



Table 14.2 Error Checking and Correcting Summary for External Transactions 
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Introduction 

This appendix provides a detailed description of the operation of each 
R4600/R4700 instruction. The instructions are listed in alphabetical 
order. 

Exceptions that may occur due to the execution of each instruction are 
listed after the description of each instruction. Descriptions of the 
immediate cause and manner of handling exceptions are omitted from the 
instruction descriptions in this appendix. 

Figures at the end of this appendix list the bit encoding for the constant 
fields of each instruction, and the bit encoding for each individual 
instruction is included with that instruction. 

Instruction Classes 

CPU instructions are divided into the following classes: 

• Load and Store instructions move data between memory and general 
registers. They are all I-type instructions, since the only addressing 
mode supported is base register + 16-bit immediate offset. 

• Computational instructions perform arithmetic, logical and shift op- 
erations on values in registers. They occur in both R-type (both 
operands are registers) and I-type (one operand is a 16-bit immediate) 
formats. 

• Jump and Branch instructions change the control flow of a program. 
Jumps are always made to absolute 26-bit word addresses (J-type 
format), or register addresses (R-type), for returns and dispatches. 
Branches have 16-bit offsets relative to the program counter (I-type). 
Jump and Link instructions save their return address in register 3 1 . 

• Coprocessor instructions perform operations in the coprocessors. 
Coprocessor loads and stores are I-type. Coprocessor computational 
instructions have coprocessor-dependent formats (see the FPU in- 
structions in Appendix B). Coprocessor zero (CPO) instructions ma- 
nipulate the memory management and exception handling facilities of 
the processor. 

• Special instructions perform a variety of tasks, including movement 
of data between special and general registers, trap, and breakpoint. 
They are always R-type. 
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Instruction Formats 

Every CPU instruction consists of a single word (32 bits) aligned on a 
word boundary and the major instruction formats are shown in Figure A, 1 . 



I-Type (Immediate) 



31 26 25 21 


20 16 


15 









op 


rs 


It 


immediate | 


J-Type (Jump) 

31 26 25 




op 


target | 


R-Type (Register) 

31 26 25 21 20 16 15 1110 6 5 




op 


rs 


rt 


rd 


shamt 


funct | 



op 6-bit operation code 

rs 5-bit source register specifier 

rt 5-bit target (source/destination) or branch condition 

immediate 16-bit immediate, branch displacement or address 
displacement 

target 26-bit jump target address 

rd 5-bit destination register specifier 

shamt 5-bit shift amount 

funct 6-bit function field 



Figure A 1 CPU Instruction Formats 

Instruction Notation Conventions 

In this appendix, all variable subfields in an instruction format (such 
as rs f rt, immediate, etc.) are shown in lowercase names. 

For the sake of clarity, we sometimes use an alias for a variable subfield 
in the formats of specific instructions. For example, we use rs = base in 
the format for load and store instructions. Such an alias is always lower 
case, since it refers to a variable subfield. 

Figures with the actual bit encoding for all the mnemonics are located 
at the end of this Appendix, and the bit encoding also accompanies each 
instruction. 

In the instruction descriptions that follow, the Operation section 
describes the operation performed by each instruction using a high-level 
language notation. 



A-2 



CPU Instruction Set Details 



Appendix A 



Special symbols used in the notation are described in Table A. 1 



Symbol 


Meaning 


<- 


Assignment. 


II 


Bit string concatenation. 


x y 


Replication of bit value x into a y-bit string. Note: x is always a single-bit 


x y:z 


Selection of bits y through z of bit string x. Little-endian bit notation is always 
used. If y is less than z, this expression is an empty (zero length) bit string. 


+ 


2's complement or floating-point addition. 


- 


2's complement or floating-point subtraction. 


* 


2's complement or floating-point multiplication. 


div 


2's complement integer division. 


mod 


2's complement modulo. 


/ 


Floating-point division. 


< 


2's complement less than comparison. 


and 


Bit-wise logical AND. 


or 


Bit-wise logical OR. 


xor 


Bit-wise logical XOR. 


nor 


Bit-wise logical NOR. 


GPR[x] 


General-Register x. The content of GPR[0] is always zero. Attempts to alter the 
content of GPR[0] have no effect. 


CPR[z,x] 


Coprocessor unit z, general register x. 


CCR[z,x] 


Coprocessor unit z, control register x. 


COC[z] 


Coprocessor unit z condition signal. 


BigEndianMem 


Big-endian mode as configured at reset (0 — » Little, 1 -» Big). Specifies the endi- 
anness of the memory interface (see LoadMemory and StoreMemory), and the en- 
dianness of Kernel and Supervisor mode execution. 


ReverseEndian 


Signal to reverse the endianness of load and store instructions in User mode; 
effected by setting the flEbit of the Status register. Thus, ReverseEndian may be 
computed as (SR 2 s and User mode). 


BigEndianCPU 


The endianness for load and store instructions (0 — > Little, 1 — » Big). In User 
mode, this endianness may be reversed by setting SR 2 5- Thus, BigEndianCPU 
may be computed as BigEndianMem XOR ReverseEndian. 


LLbit 


Bit of state to specify synchronization instructions. Set by LL, cleared by ERETand 
Invalidate and read by SC. 


T+i 


Indicates the time steps between operations. Each of the statements within a time 
step are defined to be executed in sequential order (as modified by conditional and 
loop constructs). Operations which are marked T+i: are executed at instruction cy- 
cle /relative to the start of execution of the instruction. Thus, an instruction which 
starts at time j executes operations marked T+/V at time 

/ + / The interpretation of the order of execution between two instructions or two 
operations which execute at the same time should be pessimistic; the order is not 
defined. 



Table A.1 CPU Instruction Operation Notations 
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Instruction Notation Examples 

The following examples illustrate the application of some of the 
instruction notation conventions: 











Example #1: 

GFRlrtl <~ taiaedtete It {£* 

Sixteen zero bits are concatenated with an immediate value 
(typically 16 bits), and the 32-bit string (with the lower 16 buts 
set to zero) is assigned to General-Purpose Register rt 




Example #2: 

Bit 15 (the sign bit) of an immediate value is extended for 
16 bit positions, and the result is concatenated with bits 15 
through of the immediate value to form a 32 -bit sign 
extended value. 









Load and Store Instructions 

In the R4600/R4700, as in the case of processors, the instruction 
immediately following a load may use the loaded contents of the register. 
In such cases, the hardware interlocks, requiring additional real cycles, so 
scheduling load delay slots is still desirable, although not required for 
functional code. 

Two special instructions are provided in the R4600/R4700 
implementation of the MIPS ISA, Load Linked and Store Conditional. 
These instructions are used in carefully coded sequences to provide one of 
several synchronization primitives, including test-and-set, bit-level locks, 
semaphores, and sequencers/event counts. 

In the load and store descriptions, the functions listed in Table A.2 are 
used to summarize the handling of virtual addresses and physical 
memory. 



Function 


Meaning 


AddressTranslation 


Uses the TLB to find the physical address given the virtual 
address. The function fails and an exception is taken if the 
required translation is not present in the TLB. 


LoadMemory 


Uses the cache and main memory to find the contents of 
the word containing the specified physical address. The 
low-order two bits of the address and the Access Type field 
indicates which of each of the four bytes within the data 
word need to be returned. If the cache is enabled for this 
access, the entire word is returned and loaded into the 
cache. 


StoreMemoiy 


Uses the cache, write buffer, and main memory to store 
the word or part of word specified as data in the word con- 
taining the specified physical address. The low-order two 
bits of the address and the Access Type field indicates 
which of each of the four bytes within the data word 
should be stored. 



Table A.2 Load and Store Common Functions 
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As shown in Table A. 2, the Access Type field indicates the size of the 
data item to be loaded or stored. Regardless of access type or byte- 
numbering order (endianness), the address specifies the byte which has 
the smallest byte address in the addressed field. For a big-endian 
machine, this is the leftmost byte and contains the sign for a 2's 
complement number; for a little-endian machine, this is the rightmost 
byte. 



Access Type Mnemonic 


Value 


Meaning 


DOUBLEWORD 


7 


8 bytes (64 bits) 


SEPTIBYTE 


6 


7 bytes (56 bits) 


SEXTIBYTE 


5 


6 bytes (48 bits) 


QUINTIBYTE 


4 


5 bytes (40 bits) 


WORD 


3 


4 bytes (32 bits) 


TRIPLEBYTE 


2 


3 bytes (24 bits) 


HALFWORD 


1 


2 bytes (16 bits) 


BYTE 





1 byte (8 bits) 



Table A3 Access Type Specifications for Loads /Stores 

The bytes within the addressed doubleword which are used can be 
determined directly from the access type and the three low-order bits of the 
address. 

Jump and Branch Instructions 

All jump and branch instructions have an architectural delay of exactly 
one instruction. That is, the instruction immediately following a jump or 
branch (that is, occupying the delay slot) is always executed while the 
target instruction is being fetched from storage. A delay slot may not itself 
be occupied by a jump or branch instruction; however, this error is not 
detected and the results of such an operation are undefined. 

If an exception or interrupt prevents the completion of a legal 
instruction during a delay slot, the hardware sets the EPC register to point 
at the jump or branch instruction that precedes it. When the code is 
restarted, both the jump or branch instructions and the instruction in the 
delay slot are reexecuted. 

Because jump and branch instructions may be restarted after 
exceptions or interrupts, they must be restartable. Therefore, when a 
jump or branch instruction stores a return link value, register 31 (the 
register in which the link is stored) may not be used as a source register. 

Since instructions must be word-aligned, a Jump Register or Jump 
and Link Register instruction must use a register whose two low-order 
bits are zero. If these low-order bits are not zero, an address exception will 
occur when the jump target instruction is subsequently fetched. 
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Coprocessor Instructions 

Coprocessors are alternate execution units, which have register files 
separate from the CPU. The R4600/R4700 architecture (MIPS III) provides 
three coprocessor units, or classes, and these coprocessors have two 
register spaces, each space containing thirty-two registers. These registers 
may be either 32-bits or 64-bits wide. 

• The first space, coprocessor general registers, may be directly loaded 
from memory and stored into memory, and their contents may be 
transferred between the coprocessor and processor. 

• The second space, coprocessor control registers, may only have their 
contents transferred directly between the coprocessor and the proces- 
sor. Coprocessor instructions may alter registers in either space. 

System Control Coprocessor (CPO) Instructions 

There are some special limitations imposed on operations involving 
CPO that is incorporated within the CPU. The move to/from coprocessor 
instructions are the only valid mechanism for writing to and reading from 
the CPO registers. 

Several CPO instructions are defined to directly read, write, and probe 
TLB entries and to modify the operating modes in preparation for returning 
to User mode or interrupt-enabled states. 
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ADD 



Add 



ADD 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


rt 


rd 



00000 


ADD 
1 00000 



Format: 

ADD rd, rs, rt 

Description: 

The contents of general register rs and the contents of general register 
rt are added to form the result. The result is placed into general register 
rd. The operands must be valid sign-extended, 32-bit values. 

An overflow exception occurs if the carries out of bits 30 and 31 differ 
(2's complement overflow). The destination register rd is not modified when 
an integer overflow exception occurs. 

Operation: 



T: temp <- GPR[rs] + GPR[rt] 

GPR[rd] <- (temp 31 ) 32 II temp 31 



Exceptions: 

Integer overflow exception 
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ADDI 



Add Immediate 



ADDI 





31 26 25 




21 20 


16 15 











ADDI 
001 000 


rs 


rt 


immediate 






6 


5 


5 




16 







Format: 

ADDI rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and added to the contents of 
general register rs to form the result. The result is placed into general 
register rt. The rs operand must be valid sign-extended, 32-bit values. 

An overflow exception occurs if carries out of bits 30 and 31 differ (2's 
complement overflow). The destination register rt is not modified when an 
integer overflow exception occurs. 

Operation: 



T: temp <- GPR[rs] + (immediate 15 ) 48 I I immediate 15t0 
GPR[rt] <- (temp 31 ) 32 II temp 31i0 



Exceptions: 

Integer overflow exception 
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ADDIU Add Immediate Unsigned ADDIU 





31 26 25 




21 20 


16 15 











ADDIU 
001 001 


rs 


rt 


immediate 






6 


5 


5 




16 







Format: 

ADDIU rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and added to the contents of 
general register rs to form the result. The result is placed into general 
register rt. No integer overflow exception occurs under any circumstances. 
The rs operand must be valid sign-extended, 32-bit values. 

The only difference between this instruction and the ADDI instruction 
is that ADDIU never causes an overflow exception. 

Operation: 



T: temp <- GPR[rs] + (immediate! 5 ) 48 I I immediate 15 
GPR[rt] <- (temp 31 ) 32 II temp 31i>0 



Exceptions: 

None 
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ADDU 



Add Unsigned 



ADDU 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


rt 


rd 



00000 


ADDU 
1 00001 



Format: 

ADDU rd, rs, rt 

Description: 

The contents of general register rs and the contents of general register 
rt are added to form the result. The result is placed into general register rd 
No overflow exception occurs under any circumstances. The source 
operands must be valid sign-extended, 32-bit values. 

The only difference between this instruction and the ADD instruction 
is that ADDU never causes an overflow exception. 

Operation: 



T: temp <- GPR[rs] + GPR[rt] 

GPR[rd] <r- (temp 31 ) 32 II temp 31 



Exceptions: 

None 
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AND 



And 



AND 



31 



26 25 21 20 16 15 11 10 6 5 



SPECIAL 
000000 


rs 


rt 


rd 



00000 


AND 
100100 



Format: 

AND rd, rs, rt 

Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical AND operation. The result is placed 
into general register rd 

Operation: 



T: GPR[rd] <- GPR[rs] and GPR[rt] 



Exceptions: 

None 
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ANDI 



And Immediate 



ANDI 





31 26 25 




21 20 


16 15 











ANDI 
001 1 00 


rs 


rt 


immediate 






6 


5 


5 




16 







Format: 

ANDI rt, rs, immediate 

Description: 

The 16-bit immediate is zero-extended and combined with the contents 
of general register rs in a bit-wise logical AND operation. The result is 
placed into general register rt 

Operation: 



T: GPR[rt] <r- 48 II (immediate and GPR[rs] 15 ) 



Exceptions: 

None 
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BCzF Branch On Coprocessor z False BCzF 





31 26 25 21 20 16 15 











COPz 
1 x x* 


BC 
01 000 


BCF 
00000 


offset 






6 5 5 


16 







Format: 

BCzF offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. If coprocessor is condition signal (CpCond), as 
sampled during the previous instruction, is false, then the program 
branches to the target address with a delay of one instruction. 

Because the internal condition signal is sampled during the previous 
instruction, there must be at least one instruction between this instruction 
and a coprocessor instruction that changes the internal condition signal. 

Operation: 



T-1: 

T: 

T+1: 


condition <- not COC[z] 

target «- (offset 15 ) 46 II offset II 2 

if condition then 

PC <- PC + target 
endif 



Note: *See the table "Opcode Bit Encoding" on next page, or "CPU 
Instruction Opcode Bit Encoding" at the end of Appendix A. 

Exceptions: 

Coprocessor unusable exception 



Opcode 


s Bit Encoding: 




























BCzF Bit # 

BCOF 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 










1 

















1 






























Bit # 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC1F 





1 











1 





1 






























Bit # 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC2F 





1 








1 








1 






























Coprocessor I 


Ope 

Init Mi ii 


ode 




^ V A Y ' 

BC sub-opcode Branch condition 






/I ill 




IIUCI 
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BCzFL 



Branch On Coprocessor z 
False Likely 



BCzFL 





31 26 25 21 20 16 15 











COPz 
1 x x* 


BC 
01 000 


BCFL 
0001 


offset 






6 5 5 


16 







Format: 

BCzFL offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset shifted left two bits 
and sign-extended. If the contents of coprocessor z*s condition signal, as 
sampled during the previous instruction, is false, the target address is 
branched to with a delay of one instruction. 

If the conditional branch is not taken, the instruction in the branch 
delay slot is nullified. 

Because the internal condition signal is sampled during the previous 
instruction, there must be at least one instruction between this instruction 
and a coprocessor instruction that changes the internal condition signal. 
NOTE: *See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 

Operation: 



T-1: 


condition 


<-not COC[z] 


T: 


target <r- 


(offset 15 ) 46 II offset II 2 


T+1: 


if condition then 






PC <r- PC + target 




else 


NullifyCurrentlnstruction 




endif 





Exceptions: 

Coprocessor unusable exception 

Opcode Bit Encoding: 



BCzFL Bit# 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC0FL 





1 

















1 




















1 









Bit # 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC1FL 


o| 1 











1 





1 | 





| 








1 









Bit* 


. 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC2FL 





1 








1 








1 




















1 











V A K 


s 


^ 


^y 


/ 




Coprocessor 


Uni 


Op< 
tNi 


30d( 

jmb 


3 

er 






BC 


)su 


V 
b-o 


pcode 


Bra 


incl 


V 

ICO 


ndition 
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BCzT Branch On Coprocessor z True BCzT 



31 



26 25 



21 20 



1615 



COPz 
01 x x* 


BC 
01 000 


BCT 
00001 


offset 



16 



Format: 

BCzT offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. If the coprocessor z*s condition signal (CpCond) is 
true, then the program branches to the target address, with a delay of one 
instruction. 

Because the internal condition signal is sampled during the previous 
instruction, there must be at least one instruction between this instruction 
and a coprocessor instruction that changes the internal condition signal. 

Operation: 



T-1: condition <- COC[zl 

T: target <- (offset 15 ) 46 II offset II 2 

T+1 : if condition then 

PC <r- PC + target 
endif 



NOTE: *See the table ''Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 

Exceptions: 

Coprocessor unusable exception 



Opcode Bit Encoding: 




























BCzT Bit# 

BC0T 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 










1 

















1 























1 






Bit # 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC1T 





1 











1 





1 























1 






Bit # 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC2T 





1 








1 








1 























1 








<. J 




V J 


v. J 






Coprocessor 


< 
Unit 


Dpc 
tNi 


ode 

imb 


er— 






BC 


su 


b-Of 


V 
)Code Branch condition 
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BCzTL 



Branch On Coprocessor z 
True Likely 



BCzTL 





31 26 25 21 20 1615 











COPz 
1 x x* 


BC 
01 000 


BCTL 
0001 1 


offset 






6 5 5 


16 







Format: 

BCzTL offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. If the contents of coprocessor z's condition signal, as 
sampled during the previous instruction, is true, the target address is 
branched to with a delay of one instruction. 

If the conditional branch is not taken, the instruction in the branch 
delay slot is nullified. 

Because the internal condition signal is sampled during the previous 
instruction, there must be at least one instruction between this instruction 
and a coprocessor instruction that changes the internal condition signal. 

Operation: 



T-1: condition <- COC[zl 

T: target <- (offset 15 ) 46 ll offset II 2 

T+1 : if condition then 

PC 4- PC + target 
else 

NullifyCurrentlnst ruction 
endif 



NOTE: *See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 

Exceptions: 

Coprocessor unusable exception 



Opcode Bit Encoding: 




























BCzTL Bit# 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC0TL 





1 

















1 








o| 








1 


1 






Bit # 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC1TL 





1 











1 





1 




















1 


1 






Bit # 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC2TL 





1 








1 








1 




















1 


1 






Coprocessor I 


V 

Opcode 

Jnit Mi imh< 


*r ' 


r+ V * V ' 

BC sub-opcode Branch condition 






J\ III 






31 
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BEQ 



Branch On Equal 



BEQ 





31 26 25 




21 20 


16 15 











BEQ 
0001 00 


rs 


rt 


offset 






6 


5 


5 




16 







Format: 

BEQ rs, rt, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. The contents of general register rs and the contents of 
general register rt are compared. If the two registers are equal, then the 
program branches to the target address, with a delay of one instruction. 



Operation: 




T: target <- (offset 15 ) 46 II offset II 2 
condition <- (GPR[rs] = GPR[rt]) 

T+1 : if condition then 

PC <- PC + target 
endif 



Exceptions: 

None 
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BEQL 



Branch On Equal Likely 



BEQL 





31 26 25 




21 20 


16 15 











BEQL 
01 01 00 


rs 


rt 


offset 






6 


5 


5 




16 







Format: 

BEQL rs, rt, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. The contents of general register rs and the contents of 
general register rt are compared. If the two registers are equal, the target 
address is branched to, with a delay of one instruction. If the conditional 
branch is not taken, the instruction in the branch delay slot is nullified. 



Operation: 








T: 
T+1: 


targets (offset! 6 ) 46 I 
condition <- (GPR[rs] 
if condition then 


I offset II 2 
= GPR[rt]) 




else 


PC<-PC 


+ target 




endif 


NullifyCurrentlnstruction 



Exceptions: 

None 
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Branch On Greater Than 
Or Equal To Zero 





31 26 25 




21 20 16 15 











REGIMM 
000001 


rs 


BGEZ 
00001 


offset 






6 


5 


5 


16 







Format: 

BGEZ rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. If the contents of general register rs have the sign bit 
cleared, then the program branches to the target address, with a delay of 
one instruction. 



Operation: 




T: target <- (offset 15 ) 46 II offset II 2 
condition <- (GPR[rs] 63 = 0) 

T+1 : if condition then 

PC <- PC + target 
endif 



Exceptions: 

None 
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BGEZAL 



Branch On Greater Than 
Or Equal To Zero And Link 



BGEZAL 





31 26 25 


21 20 16 15 











REGIMM 
000001 


rs 


BGEZAL 
1 0001 


offset 






6 


5 5 


16 







Format: 

BGEZAL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset shifted left two bits 
and sign-extended. Unconditionally, the address of the instruction after 
the delay slot is placed in the link register, r31. If the contents of general 
register rs have the sign bit cleared, then the program branches to the 
target address, with a delay of one instruction. 

General register rs may not be general register 31, because such an 
instruction is not restartable. An attempt to execute this instruction is not 
trapped, however 

Operation: 



T: 


target <- (offset 15 ) 46 


II offset II 2 




condition <- (GPR[rs] 63 = 


= 0) 




GPR[31]<-PC + 8 






T+1: 


if condition then 
PC <- PC + target 
end if 







Exceptions: 

None 
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D /^ EI 7 A I I Branch On Greater Than 
D ^ Cfc#%l-L. or Equal To Zero 

And Link Likely 



BGEZALL 



31 



26 25 



21 20 



16 15 



REGIMM 
000001 


rs 


BGEZALL 
1 001 1 


offset 



16 



Format: 

BGEZALL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. Unconditionally, the address of the instruction after 
the delay slot is placed in the link register, r31. If the contents of general 
register rs have the sign bit cleared, then the program branches to the 
target address, with a delay of one instruction. General register rs may not 
be general register 3 J, because such an instruction is not restartable. An 
attempt to execute this instruction is not trapped, however. If the 
conditional branch is not taken, the instruction in the branch delay slot is 
nullified. 



Operation: 



T: target <- (offset 15 ) 46 II offset II 2 

condition <- (GPR[rs] 63 = 0) 

GPR[31]<-PC + 8 
T+1 : if condition then 

PC <- PC + target 

else 

NullifyCurrentlnstruction 

endif 



Exceptions: 

None 
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BGEZL 



Branch On Greater 
Than Or Equal To Zero Likely 





31 26 25 




21 20 16 15 











REGIMM 
000001 


rs 


BGEZL 
0001 1 


offset 






6 


5 


5 


16 







Format: 

BGEZL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. If the contents of general register rs have the sign bit 
cleared, then the program branches to the target address, with a delay of 
one instruction. If the conditional branch is not taken, the instruction in 
the branch delay slot is nullified. 

Operation: 



T: 
T+1: 


target <- (offset 15 ) 46 II offset II 2 
condition <- (GPR[rs] 63 = 0) 
if condition then 




else 


PC <- PC + target 


endif 




NullifyCurrentlnst ruction 



Exceptions: 

None 
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BGTZ 



Branch On Greater Than Zero 



BGTZ 





31 26 25 




21 20 16 15 











BGTZ 
0001 1 1 


rs 



00000 


offset 






6 


5 


5 


16 







Format: 

BGTZ rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset shifted left two bits 
and sign-extended. The contents of general register rs axe compared to 
zero. If the contents of general register rs have the sign bit cleared and are 
not equal to zero, then the program branches to the target address, with a 
delay of one instruction. 

Operation: 



T: 
T+1: 


target <- (offset 15 ) 46 II offset II 2 
condition <- (GPR[rs] 63 = 0) and (GPR[rs] * 64 ) 
if condition then 
PC <- PC + target 
endif 



Exceptions: 

None 
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BGTZL 



Branch On Greater 
Than Zero Likely 



BGTZL 





31 26 25 


21 20 16 15 











BGTZL 
010111 


rs 



00000 


offset 






6 


5 5 


16 







Format: 

BGTZL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. The contents of general register rs are compared to 
zero. If the contents of general register rs have the sign bit cleared and are 
not equal to zero, then the program branches to the target address, with a 
delay of one instruction. If the conditional branch is not taken, the 
instruction in the branch delay slot is nullified. 

Operation: 



T: 
T+1: 


target «- (offset 15 ) 46 II offset II 2 

condition <- (GPR[rs] 63 = 0) and (GPR[rs] * 64 ) 

if condition then 




else 


PC <- PC + target 






end if 


NullifyCurrentlnstruction 





Exceptions: 

None 
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Branch on Less Than 
Or Equal To Zero 



BLEZ 





31 26 25 




21 20 16 15 











BLEZ 
0001 1 


rs 



00000 


offset 






6 


5 


5 


16 







Format: 

BLEZ rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. The contents of general register rs axe compared to 
zero. If the contents of general register rs have the sign bit set, or are equal 
to zero, then the program branches to the target address, with a delay of 
one instruction. 

Operation: 



T: 
T+1: 


target <- (offset, 5 ) 46 II offset II 2 
condition «- (GPRtrslgg = 1) and (GPR[rs] = 
if condition then 

PC <- PC + target 
endif 


--0 64 ) 



Exceptions: 

None 
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BLEZL 



Branch on Less Than 
Or Equal To Zero Likely 



BLEZL 





31 26 25 




21 20 16 15 











BLEZL 
010110 


rs 



00000 


offset 






6 


5 


5 


16 







Format: 

BLEZL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. The contents of general register rs is compared to zero. 
If the contents of general register rs have the sign bit set, or are equal to 
zero, then the program branches to the target address, with a delay of one 
instruction. 

If the conditional branch is not taken, the instruction in the branch 
delay slot is nullified. 

Operation: 



T: 
T+1: 


target <- (offset 15 ) 46 II offset II 2 
condition <- (GPR[rs] 63 = 1) and (GPR[rs] = 
if condition then 


= 64 ) 




else 


PC <- PC + target 






endif 


NullifyCurrentlnstruction 





Exceptions: 

None 
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BLTZ 



Branch On Less Than Zero 



BLTZ 





31 26 25 




21 20 16 15 











REGIMM 
000001 


rs 


BLTZ 
00000 


offset 






6 


5 


5 


16 







Format: 

BLTZ rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset shifted left two bits 
and sign-extended. If the contents of general register rs have the sign bit 
set, then the program branches to the target address, with a delay of one 
instruction. 



Operation: 



T: target <- (offset 16 ) 46 II offset II 2 
condition 4- (GPR[rs] 63 = 1) 

T+1 : if condition then 

PC <- PC + target 
endif 



Exceptions: 

None 
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BLTZAL 



Branch On Less 
Than Zero And Link 



BLTZAL 





31 26 25 




21 20 16 15 











REGIMM 
000001 


rs 


BLTZAL 
1 0000 


offset 






6 


5 


5 


16 







Format: 

BLTZAL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset shifted left two bits 
and sign-extended. Unconditionally, the address of the instruction after 
the delay slot is placed in the link register, r3 I. If the contents of general 
register rs have the sign bit set, then the program branches to the target 
address, with a delay of one instruction. 

General register rs may not be general register 3 J, because such an 
instruction is not restartable. An attempt to execute this instruction with 
register 31 specified as rs is not trapped, however. 

Operation: 



T: 
T+1: 


target <- (offset^) 46 II offset II 2 
condition <- (GPR[rs] 63 = 1) 
GPR[31]<-PC + 8 
if condition then 

PC <- PC + target 
end if 



Exceptions: 

None 
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BLTZALL 



Branch On Less 
Than Zero And Link Likely 



BLTZALL 





31 26 25 




21 20 16 15 











REGIMM 
000001 


rs 


BLTZALL 
1 001 


offset 






6 


5 


5 


16 







Format: 

BLTZALL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset shifted left two bits 
and sign-extended. Unconditionally, the address of the instruction after 
the delay slot is placed in the link register, r3 I. If the contents of general 
register rs have the sign bit set, then the program branches to the target 
address, with a delay of one instruction. 

General register rs may not be general register 31, because such an 
instruction is not restartable. An attempt to execute this instruction with 
register 31 specified as rs is not trapped, however. If the conditional 
branch is not taken, the instruction in the branch delay slot is nullified. 

Operation: 



T: 
T+1: 


target <- (offset 15 ) 46 II offset II 2 
condition <- (GPR[rs] 63 = 1) 
GPR[31]<-PC + 8 
if condition then 




else 


PC <- PC + target 




end if 


NullifyCurrentlnstruction 



Exceptions: 

None 
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B LTZL Branch ° n uss Tha" z e r ° uke| y B LTZL 





31 26 25 




21 20 16 15 











REGIMM 
000001 


rs 


BLTZL 
00010 


offset 






6 


5 


5 


16 







Format: 

BLTZ rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. If the contents of general register rs have the sign bit 
set, then the program branches to the target address, with a delay of one 
instruction. If the conditional branch is not taken, the instruction in the 
branch delay slot is nullified. 



Operation: 










T: 
T+1: 


target <- (offset 15 ) 46 II offset II 2 
condition <- (GPR[rs] 63 = 1) 
if condition then 






else 


PC <- PC + target 






endif 


NullifyCurrentlnstruction 



Exceptions: 

None 
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BNE 



Branch On Not Equal 



BNE 





31 26 25 




21 20 


16 15 











BNE 
0001 01 


rs 


rt 


offset 






6 


5 


5 




16 







Format: 

BNE rs, rt, offset 

Description: 

A branch target address is computed from the sum of the address of 
the Instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. The contents of general register rs and the contents of 
general register rt are compared. If the two registers are not equal, then 
the program branches to the target address, with a delay of one 
instruction. 



Operation: 




T: target*- (offset 15 ) 46 II offset II 2 
condition <- (GPR[rs] * GPR[rt]) 

T+1 : if condition then 

PC «- PC + target 
endif 



Exceptions: 

None 



A- 31 



CPU Instruction Set Details 



Appendix A 



B N E L Branch On Not Equal Likely 



BNEL 





31 26 25 




21 20 


16 15 











BNEL 
01 01 01 


rs 


rt 


offset 






6 


5 


5 




16 







Format: 

BNEL rs, rt, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset shifted left two bits 
and sign-extended. The contents of general register rs and the contents of 
general register rt are compared. If the two registers are not equal, then 
the program branches to the target address, with a delay of one 
instruction. 

If the conditional branch is not taken, the instruction in the branch 
delay slot is nullified. 

Operation: 



T: 
T+1: 


target <- (offset 15 ) 46 II offset II 2 
condition <- (GPR[rs] * GPR[rt]) 
if condition then 




else 


PC <- PC + target 




endif 


NullifyCurrentlnstruction 



Exceptions: 

None 
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BREAK 



Breakpoint 



BREAK 





31 




26 25 




65 









SPECIAL 
000000 


code 


BREAK 
001 1 01 








6 




20 


6 







Format: 

BREAK 

Description: 

A breakpoint trap occurs, immediately and unconditionally 
transferring control to the exception handler. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



T: BreakpointException 



Exceptions: 

Breakpoint exception 
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CACHE 



Cache 



CACHE 





31 26 25 21 20 


16 15 











CACHE 
101111 


base 


op 


offset 






6 5 5 




16 







Format: 

CACHE op, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The virtual address is translated 
to a physical address using the TLB, and the 5-bit sub-opcode specifies a 
cache operation for that address. 

If CPO is not usable (User or Supervisor mode) the CPO enable bit in the 
Status register is clear, and a coprocessor unusable exception is taken. 
The operation of this instruction on any operation/cache combination not 
listed below is undefined. The operation of this instruction on uncached 
addresses is also undefined. 

The R4600/R4700 uses only the tag comparisons, not the valid bits, to 
choose which data it supplies to the instruction unit. This makes it 
important that the tags of the A and B sets are never the same. 

The Index operation uses part of the virtual address to specify a cache 
block, with vAddr 13 selecting the set to access. 

For a primary cache of 16KB with 32 bytes per tag, vAddr 12 5 specifies 
the block. 

Index Load Tag also uses vAddr 4 #3 to select the doubleword for reading 
parity. When the CE bit of the Status register is set, Hit WriteBack, Hit 
WriteBack Invalidate, Index WriteBack Invalidate, and Fill also use 
vAddr 4 3 to select the doubleword that has its parity modified. This 
operation is performed unconditionally. 

The Hit operation accesses the specified cache as normal data 
references, and performs the specified operation if the cache block 
contains valid data with the specified physical address (a hit). If both sets 
are invalid or contain different addresses (a miss), no operation is 
performed. 

Write back from a primary cache goes to memoiy. The address to be 
written is specified by the cache tag and not the translated physical 
address. 

TLB Refill and TLB Invalid exceptions can occur on any operation. For 
Index operations (where the physical address is used to index the cache 
but need not match the cache tag) unmapped addresses may be used to 
avoid TLB exceptions. This operation never causes TLB Modified or Virtual 
Coherency exceptions. 

Bits 17.. 16 of the instruction specify the cache as follows: 



Code 


Name 


Cache 





I 


primary instruction 


1 


D 


primary data 


2-3 


NA 


Undefined 
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Bits 20.. 18 (this value is listed under the Code column) of the 
instruction specify the operation as follows: 



Code 


Caches 


Name 


Operation 





I 


Index Invalidate 


Set the cache state of the cache block to Invalid. 
Index_Invalidate_I writes the physical address of the 
cache op into the tag when it clears the valid bit, which 
is different from the R4000. 





D 


Index Write- 
Back Invalidate 


Examine the cache state and W bit of the primary data 
cache block at the index specified by the virtual 
address. If the state is not Invalid and the W bit is set, 
then write back the block to memory. The address to 
write is taken from the primary cache tag. Set cache 
state of primary cache block to Invalid. 


1 


I, D 


Index Load Tag 


Read the tag for the cache block at the specified index 
and place it into the TagLo CPO registers, ignoring par- 
ity errors. Also load the data parity bits into the ECC 
register. 


2 


I, D 


Index Store Tag 


Write the tag for the cache block at the specified index 
from the TagLo and TagHi CPO registers. 


3 


D 


Create Dirty 
Exclusive 


This operation is used to avoid loading data needlessly 
from memory when writing new contents into an entire 
cache block. If the cache block does not contain the 
specified address, and the block is dirty, write it back 
to the memory. In all cases, set the cache block tag to 
the specified physical address, set the cache state to 
Dirty Exclusive. 


4 


I, D 


Hit Invalidate 


If the cache block contains the specified address, mark 
the cache block invalid. 


5 


D 


Hit WriteBack 
Invalidate 


If the cache block contains the specified address, write 
back the data if it is dirty, and mark the cache block 
invalid. 


5 


I 


Fill 


Fill the primary instruction cache block from memory. 
If the CE bit of the Status register is set, the contents of 
the ECC register is used instead of the computed parity 
bits for addressed doubleword when written to the 
instruction cache. Uses bit 13 to pick the set. 


6 


D 


Hit WriteBack 


If the cache block contains the specified address, and 
the W bit is set, write back the data to memory and 
clear the W bit. 


6 


I 


Hit WriteBack 


If the cache block contains the specified address, write 
back the data unconditionally. 



Operation: 



T: vAddr <- ((offset 15 ) 48 II offset 15ii0 ) + GPR[base] 

(pAddr, uncached) <r- AddressTranslation (vAddr, DATA) 
CacheOp (op, vAddr, pAddr) 



Exceptions: 

Coprocessor unusable exception 
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CFCz 



Move Control From 
Coprocessor 



CFCz 





31 26 25 21 20 


16 15 


11 10 











COPz 
01 x x* 


CF 
0001 


it 


rd 



00000 






6 5 5 


5 




11 







Format: 

CFCz rt, rd 

Description: 

The contents of coprocessor control register rd of coprocessor unit z are 
loaded into general register rt. 

This instruction is not valid for CPO. 



Operation: 



T: data *- (CCR[z,rd] 31 ) 32 II CCR[z,rd] 
T+1: GPR[rt] <-data 



Exceptions: 

Coprocessor unusable exception 

♦Opcode Bit Encoding: 



CFCz Bit 

CFC1 


#31 


30 29 28 27 26 25 24 23 22 21 


C 







1 











1 











1 









Bit 


#31 


30 29 28 27 26 25 24 23 22 21 







CFC2 





1 








1 














1 











^ 




V J 


^ 


sorS 


-- 


n 










Opcode 

Coproce 


ssorl 


JnitNi 


Coproces; 
jmber 


uboperatio 
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COPz 



Coprocessor Operation 



COPz 





31 26 25 24 











COPz 
1 x x* 


CO 
1 


cofun 






6 1 


25 







Format: 

COPz cofun 

Description: 

A coprocessor operation is performed. The operation may specify and 
reference internal coprocessor registers, and may change the state of the 
coprocessor condition line, but does not modify state within the processor 
or the cache/memory system. Details of coprocessor operations are 
contained in Appendix B. 

Operation: 



CoprocessorOperation (z, cofun) 



Exceptions: 

Coprocessor unusable exception 

Coprocessor interrupt or Floating-Point Exception 

"Opcode Bit Encoding: 



COPz Bit # 


31 30 29 28 27 26 25 







COPO 





1 














1 




Bit # 


31 30 29 28 27 26 25 







C0P1 





1 











1 


1 




Bit # 


31 30 29 28 27 26 25 







C0P2 





1 








1 


|l 






' y A -r-V 

Oocode L ~ C0 sub-opcode (see 
^ I— Coprocessor Unit Number 


end of Appendix A) 
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CTCZ Move Control to Coprocessor CTCZ 



31 



26 25 21 20 16 15 11 10 



COPz 
1 OOxx* 


CT 
001 1 


rt 


id 



000 0000 0000 



11 



Format: 

CTCz it, rd 

Description: 

The contents of general register rt are loaded into control register rd of 
coprocessor unit z. 

This instruction is not valid for CP0. 



Operation: 



T: data<-GPR[rt] 
T+1: CCR[z,rd] <-data 



Exceptions: 

Coprocessor unusable 
NOTE: *See "CPU Instruction Opcode Bit Encoding" at the end of Appendix A. 
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DADD 



Doubleword Add 



DADD 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


rt 


rd 



00000 


DADD 
101100 



Format: 

DADD rd, rs, rt 

Description: 

The contents of general register rs and the contents of general register 
rt are added to form the result. The result is placed into general register rd. 

An overflow exception occurs if the carries out of bits 62 and 63 differ 
(2's complement overflow). The destination register rd is not modified 
when an integer overflow exception occurs. 

Operation: 



GPR[rd] *-GPR[rs] + GPR[rt] 



Exceptions: 

Integer overflow exception 
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DADDI Doubleword Add Immediate DADDI 





31 26 25 




21 20 


16 15 











DADDI 
01 1 000 


rs 


rt 


immediate 






6 


5 


5 




16 







Format: 

DADDI it, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and added to the contents of 
general register rs to form the result. The result is placed into general 
register rt. 

An overflow exception occurs if carries out of bits 62 and 63 differ (2's 
complement overflow). The destination register rt is not modified when an 
integer overflow exception occurs. 

Operation: 



T: GPR [rt] <- GPR[rs] + (immediate! 5 ) 48 II immediate 15 



Exceptions: 

Integer overflow exception 
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Doubleword Add 



DADDIU JSSSffESZ* DADDIU 





31 26 25 




21 20 


16 15 











DADDIU 
011001 


rs 


rt 


immediate 






6 


5 


5 




16 







Format: 

DADDIU rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and added to the contents of 
general register rs to form the result. The result is placed into general 
register rt. No integer overflow exception occurs under any circumstances. 

The only difference between this instruction and the DADDI 
instruction is that DADDIU never causes an overflow exception. 

Operation: 



GPR [rt] <r- GPR[rs] + (immediate 15 ) 48 II immediate! 5<>0 



Exceptions: 

None 
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DADDU Doubleword Add Unsigned DADDU 





31 26 25 21 20 


16 15 


11 10 6 


5 






SPECIAL 
000000 


rs 


rt 


rd 



00000 


DADDU 
101101 






6 5 5 


5 


5 


6 






Format: 

DADDU rd, rs, rt 















Description: 

The contents of general register rs and the contents of general register 
rt are added to form the result. The result is placed into general register rd. 

No overflow exception occurs under any circumstances. 

The only difference between this instruction and the DADD instruction 
is that DADDU never causes an overflow exception. 

Operation: 



GPR[rd] <-GPR[rs] + GPR[rt] 



Exceptions: 

None 
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DDIV 



Doubleword Divide 



DDIV 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
000000 


rs 


it 



00 0000 0000 


DDIV 
011110 



10 



Format: 

DDIV rs, it 

Description: 

The contents of general register rs are divided by the contents of 
general register rt, treating both operands as 2's complement values. No 
overflow exception occurs under any circumstances, and the result of this 
operation is undefined when the divisor is zero. 

This instruction is typically followed by additional instructions to 
check for a zero divisor and for overflow. 

When the operation completes, the quotient word of the double result 
is loaded into special register LO, and the remainder word of the double 
result is loaded into special register HI 

If either of the two preceding instructions is MFHI or MFLO, the results 
of those instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by two or more instructions. 

Operation: 



T-2: 


LO <- undefined 




HI <- undefined 


T-1: 


LO i- undefined 




HI <- undefined 


T: 


LO <- GPR[rs] div GPR[rt] 




HI <- GPR[rs] mod GPR[rt] 



Exceptions: 

None 
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DDIVU Doubleword Divide Unsigned DDIVU 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
000000 


rs 


rt 



000000 0000 


DDIVU 
011111 



10 



Format: 

DDIVU rs, rt 

Description: 

The contents of general register rs are divided by the contents of 
general register rt, treating both operands as unsigned values. No integer 
overflow exception occurs under any circumstances, and the result of this 
operation is undefined when the divisor is zero. 

This instruction is typically followed by additional instructions to 
check for a zero divisor. 

When the operation completes, the quotient word of the double result 
is loaded into special register LO, and the remainder word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results 
of those instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by two or more instructions. 

Operation: 



T-2: 


LO 


<- undefined 




HI 


<r- undefined 


T-1: 


LO 


<- undefined 




HI 


<- undefined 


T: 


LO 


<- (0 II GPR[rs]) div (0 II GPR[rt]) 




HI 


<- (0 II GPR[rs]) mod (0 II GPR[rt]) 



Exceptions: 

None 
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DIV 



Divide 



DIV 





31 26 25 




21 20 


16 15 6 5 






SPECIAL 
000000 


rs 


rt 



00 0000 0000 


DIV 
011010 






6 


5 


5 


10 6 





Format: 

DIV rs, rt 

Description: 

The contents of general register rs are divided by the contents of 
general register rt, treating both operands as 2's complement values. No 
overflow exception occurs under any circumstances, and the result of this 
operation is undefined when the divisor is zero. 

The operands must be valid sign-extended, 32-bit values. 

This instruction is typically followed by additional instructions to 
check for a zero divisor and for overflow. 

When the operation completes, the quotient word of the double result 
is loaded into special register LO, and the remainder word of the double 
result is loaded into special register HL 

If either of the two preceding instructions is MFHI or MFLO, the results 
of those instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by two or more instructions. 



Operation: 






T-2: 


LO 


«- undefined 




HI 


<r- undefined 


T-1: 


LO 


<- undefined 




HI 


<- undefined 


T: 


q 

r 

LO 

HI 


<- GPR[rs] 31 .. div GPR[rt] 3 i..o 
<- GPRfrs] 31 . mod GPR[rt] 31 ., 

<- (qsiC I" <bi..o 
<- ( r 3i) 32 I" r 3 i..o 



Exceptions: 

None 
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DIVU 



Divide Unsigned 



DIVU 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
000000 


rs 


rt 



000000 0000 


DIVU 
011011 



10 



Format: 

DIVU rs, rt 

Description: 

The contents of general register rs axe divided by the contents of 
general register rt, treating both operands as unsigned values. No integer 
overflow exception occurs under any circumstances, and the result of this 
operation is undefined when the divisor is zero. 

The operands must be valid sign-extended, 32-bit values. 

This instruction is typically followed by additional instructions to 
check for a zero divisor. 

When the operation completes, the quotient word of the double result 
is loaded into special register LO, and the remainder word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results 
of those instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by two or more instructions. 

Operation: 



T-2: 


LO 


<- undefined 




HI 


<- undefined 


T-1: 


LO 


«- undefined 




HI 


<- undefined 


T: 


q 

r 

LO 

HI 


<r- (0 II GPR[rs] 31 .. ) div (0 II GPR[rt] 31 .. ) 
<- (0 II GPR[rs] 31 ) mod (0 II GPR[rt] 31 ) 

<-(q3i£?ll<bi..o 
<-(i-3i) 32 «r 3 i..o 



Exceptions: 

None 
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Doubleword Move From 



rkll/IC^n uouoiewora Move i-rom rMl/IC/^n 

U IVI r Lrtl System Control Coprocessor U IVI I" V/U 



31 



26 25 



21 20 



16 15 



11 10 



COP0 
01 0000 


DMF 
00001 


rt 


rd 



000 0000 00 00 



11 



Format: 

DMFCO rt, rd 

Description: 

The contents of coprocessor register rd of the CPO are loaded into 
general register rt. 

This operation is defined in kernel mode regardless of the setting of the 
Status. KX bit. Execution of this instruction with in supervisor mode with 
Status. SX = or in user mode with UX = 0, causes a reserved instruction 
exception. All 64-bits of the general register destination are written from 
the coprocessor register source. The operation of DMFCO on a 32-bit 
coprocessor register is undefined. 

Operation: 



T: data <-CPR[0,rd] 
T+1: GPR[rt]<-data 



Exceptions: 

Coprocessor unusable exception 

Reserved instruction exception for supervisor mode with Status. SX = 
or user mode with Status.UX = 0. 
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Doubleword Move To 



DMTCO System Control Coprocessor DMTCO 



31 26 


25 21 


20 


16 


15 


11 10 


COPO 
01 0000 


DMT 
001 01 


rt 


rd 



000 000000 00 



6 



11 



Format: 

DMTCO rt, rd 

Description: 

The contents of general register rt are loaded into coprocessor register 
rd of the CPO. 

This operation is defined in kernel mode regardless of the setting of the 
Status, KX bit. Execution of this instruction with in supervisor mode with 
Status. SX = or in user mode with UX = 0, causes a reserved instruction 
exception. 

All 64-bits of the coprocessor register are written from the general 
register source. The operation of DMTCO on a 32-bit coprocessor register 
is undefined. 

Because the state of the virtual address translation system may be 
altered by this instruction, the operation of load instructions, store 
instructions, and TLB operations immediately prior to and after this 
instruction are undefined. 

Operation: 




Exceptions: 

Reserved instruction exception for supervisor mode with Status.SX = 
or user mode with Status.UX = 0. 
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DMULT Doubleword Multiply DMULT 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
000000 


rs 


rt 



00 0000 0000 


DMULT 
011100 



10 



Format: 

DMULT rs, rt 

Description: 

The contents of general registers rs and rt are multiplied, treating both 
operands as 2's complement values. No integer overflow exception occurs 
under any circumstances. 

When the operation completes, the low-order word of the double result 
is loaded into special register LO, and the high-order word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results 
of these instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by a minimum of two other instructions. 

Operation: 



1-2: 


LO 


<- undefined 




HI 


«- undefined 


T-1: 


LO 


<- undefined 




HI 


<- undefined 


T: 


t 


<- GPR[rs] * GPR[rt] 




LO 


<~ *63..0 




HI 


*- *127..64 



Exceptions: 

None 
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DMULTU 



Doubleword Multiply 
Unsigned 



DMULTU 





31 26 25 21 20 


16 15 6 5 






SPECIAL 
000000 


rs 


rt 



00 0000 0000 


DMULTU 
011101 






6 5 5 


10 6 






Format: 

DMULTU rs, 


rt 











Description: 

The contents of general register rs and the contents of general register 
rt are multiplied, treating both operands as unsigned values. No overflow 
exception occurs under any circumstances. 

When the operation completes, the low-order word of the double result 
is loaded into special register LO, and the high-order word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results 
of these instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by a minimum of two instructions. 

Operation: 



T-2: 


LO <r- undefined 
HI <- undefined 




T-1: 


LO <r- undefined 
HI <r- undefined 




T: 


t<-(0IIGPR[rs])< 
LO <- 1 63<#0 

Hl <-tl27. .64 


* (0 II GPR[rt]) 



Exceptions: 

None 
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DSLL Doubleword Shift Left Logical DSLL 



31 



26 25 21 20 16 15 11 10 6 5 



SPECIAL 
000000 



00000 


rt 


id 


sa 


DSLL 
1 1 1 000 



Format: 

DSLL rd, rt, sa 

Description: 

The contents of general register rt are shifted left by sa bits, inserting 
zeros into the low-order bits. The result is placed in register r&. 

Operation: 



T: s <- II sa 



GPR[rd]^GPR[rt] (63 ^ IIO s 



Exceptions: 

None 



A- 51 



CPU Instruction Set Details 



Appendix A 



DSLLV 



Doubleword Shift Left 
Logical Variable 



DSLLV 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


it 


rd 



00000 


DSLLV 
01 01 00 



Format: 

DSLLV rd, rt, rs 

Description: 

The contents of general register rt are shifted left by the number of bits 
specified by the low-order six bits contained in general register rs, inserting 
zeros into the low-order bits. The result is placed in register rd. 

Operation: 



T: s <- GPR[rs] 5 . <0 

GPR[rd]^GPR[rt] (63 ^ )> .ollO s 



Exceptions: 

None 
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DSLL32 



Doubleword Shift Left 
Logical + 32 



DSLL32 



31 



26 25 21 20 16 15 11 10 6 5 



SPECIAL 
000000 



00000 


rt 


rd 


sa 


DSLL32 
111100 



Format: 

DSLL32 rd, rt, sa 

Description: 

The contents of general register rt are shifted left by 32+sa bits, 
inserting zeros into the low-order bits. The result is placed in register rd. 

Operation: 



T: s^ 1 II sa 

GPR[rd]<- GPR[rt] (6a _ s) .. II s 



Exceptions: 

None 
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DSRA 



Doubleword 
Shift Right Arithmetic 



DSRA 



31 



26 25 21 20 16 15 11 10 6 5 



SPECIAL 
000000 



00000 


rt 


rd 


sa 


DSRA 
111011 



6 



Format: 

DSRA. rd, rt, sa 

Description: 

The contents of general register rt are shifted right by sa bits, sign- 
extending the high-order bits. The result is placed in register rd. 

Operation: 



T: s <r- II sa 



GPR[rd] <- (GPR[rt] 63 ) s II GPR[rt] 63 .. s 



Exceptions: 

None 
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DSRAV 



Doubleword Shift Right 
Arithmetic Variable 



DSRAV 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


rt 


rd 



00000 


DSRAV 
010111 



6 



6 



Format: 

DSRAV rd, rt, rs 

Description: 

The contents of general register rt are shifted right by the number of 
bits specified by the low-order six bits of general register rs, sign-extending 
the high-order bits. The result is placed in register rd. 

Operation: 



T: s<-GPR[rs] 5 „ 

GPR[rd]^ (GPR[rt] 63 ) s II GPR[rt] 63 .. s 



Exceptions: 

None 
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DSRA32 °° u tmmJo^ 9M DSRA32 



31 



26 25 



21 20 16 15 11 10 6 5 



SPECIAL 
000000 



00000 


it 


rd 


sa 


DSRA32 
111111 



Format: 

DSRA32 rd, it, sa 

Description: 

The contents of general register rt are shifted right by 32+sa bits, sign- 
extending the high-order bits. The result is placed in register rd. 

Operation: 



T: s<-1 II sa 

GPR[rd] <- (GPR[rt] 63 ) s II GPR[rt] 63 ., 



Exceptions: 

None 
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DSRL 



Doubleword 
Shift Right Logical 



DSRL 



31 



26 25 21 20 16 15 11 10 6 5 



SPECIAL 
000000 



00000 


rt 


rd 


sa 


DSRL 
111010 



6 



6 



Format: 

DSRL rd, rt, sa 

Description: 

The contents of general register rt are shifted right by sa bits, inserting 
zeros into the high-order bits. The result is placed in register rd. 

Operation: 




Exceptions: 

None 
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DSRLV 



Doubleword Shift Right 
Logical Variable 



DSRLV 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


rt 


id 



00000 


DSRLV 
010110 



6 



Format: 

DSRLV rd, it, rs 

Description: 

The contents of general register rt are shifted right by the number of 
bits specified by the low-order six bits of general register rs, inserting zeros 
into the high-order bits. The result is placed in register rd. 

Operation: 



T: s<-GPR[rs] 5J) 

GPR[rd]<- s II GPR[rt] 63 .. s 



Exceptions: 

None 
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DSRL32 Doubl ffli s + hi 3 ft 2 Ri9ht DSRL32 



31 



26 25 21 20 16 15 11 10 6 5 



SPECIAL 
000000 



00000 


rt 


rd 


sa 


DSRL32 
111110 



Format: 

DSRL32 rd, rt, sa 

Description: 

The contents of general register rt are shifted right by 32+sa bits, 
inserting zeros into the high-order bits. The result is placed in register rd. 

Operation: 



T: s<- 1 II sa 



GPR[rd] <- s II GPR[rt] 63#s 



Exceptions: 

None 
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DSUB 



Doubleword Subtract 



DSUB 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


rt 


rd 



00000 


DSUB 
101110 



Format: 

DSUB rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs to form a result. The result is placed into general 
register rd. 

The only difference between this instruction and the DSUBU 
instruction is that DSUBU never traps on overflow. 

An integer overflow exception takes place if the carries out of bits 62 
and 63 differ (2's complement overflow). The destination register rd is not 
modified when an integer overflow exception occurs. 

Operation: 



GPR[rd] <- GPR[rs] - GPR[rt] 



Exceptions: 

Integer overflow exception 
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D S U B U Doubleword Subtract Unsigned DS U B U 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


rt 


rd 



00000 


DSUBU 
101111 



Format: 

DSUBU rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs to form a result. The result is placed into general 
register rd 

The only difference between this instruction and the DSUB instruction 
is that DSUBU never traps on overflow. No integer overflow exception 
occurs under any circumstances. 

Operation: 



T: GPR[rd] <- GPR[rs] - GPR[rt] 



Exceptions: 

None 
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ERET 



Exception Return 



ERET 





31 26 25 24 




6 5 






COPO 
01 0000 


CO 

1 



000 0000 0000 0000 0000 


ERET 
01 1 000 






6 1 


19 


6 





Format: 

ERET 

Description: 

ERET is the R4600 instruction for returning from an interrupt, 
exception, or error trap. Unlike a branch or jump instruction, ERET does 
not execute the next instruction. 

ERET must not itself be placed in a branch delay slot. 

If the processor is servicing an error trap [SR 2 = 1), then load the PC 
from the ErrorEPC and clear the ERL bit of the Status register (SRjj. 
Otherwise {SR 2 = 0), load the PC from the EPC, and clear the EXLbit of the 
Status register (SR^. 

An ERET executed between a LL and SC also causes the SC to fall. 

Operation: 



T: 


if SR 2 = 


1then 






PC 


<r- ErrorEPC 




SR 


<-SR 31 . 


3 II II SR 1 o 




else 








PC 


<- EPC 






SR 


<-SR 31 . 


2 II II SR 




endif 








LLbit 4- 








Exceptions: 

Coprocessor unusable exception 
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Jump 




Format: 

J target 

Description: 

The 26-bit target address is shifted left two bits and combined with the 
high-order bits of the address of the delay slot. The program 
unconditionally jumps to this calculated address with a delay of one 
instruction. 

Operation: 




Exceptions: 

None 
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JAL 



Jump And Link 



JAL 




Format: 

JAL target 

Description: 

The 26-bit target address is shifted left two bits and combined with the 
high-order bits of the address of the delay slot. The program 
unconditionally jumps to this calculated address with a delay of one 
instruction. The address of the instruction after the delay slot is placed in 
the link register, r31. 

Operation: 




Exceptions: 

None 
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JALR 



Jump And Link Register JALR 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 



00000 


rd 



00000 


JALR 
001 001 



Format: 

JALRrs 
JALR rd, rs 

Description: 

The program unconditionally jumps to the address contained in 
general register rs, with a delay of one instruction. The address of the 
instruction after the delay slot is placed in general register rd. The default 
value of rd, if omitted in the assembly language instruction, is 31. 

Register specifiers rs and rd may not be equal, because such an 
instruction does not have the same effect when re-executed. However, an 
attempt to execute this instruction is not trapped, and the result of 
executing such an instruction is undefined. 

Since instructions must be word-aligned, a Jump and Link Register 
instruction must specify a target register (rs) whose two low-order bits are 
zero. If these low-order bits are not zero, an address exception will occur 
when the jump target instruction is subsequently fetched. 

Operation: 



T+1: 



temp <- GPR [rs] 
GPR[rd] <- PC + 8 
PC <- temp 



Exceptions: 

None 
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JR 



Jump Register 



JR 



31 



26 25 



21 20 



65 



SPECIAL 
000000 


rs 



000 0000 0000 0000 


JR 
001 000 



15 



Format: 

JR rs 



Description: 

The program unconditionally jumps to the address contained in 
general register rs, with a delay of one instruction. 

Since instructions must be word-aligned, a Jump Register instruction 
must specify a target register {rs) whose two low-order bits are zero. If these 
low-order bits are not zero, an address exception will occur when the jump 
target instruction is subsequently fetched. 

Operation: 



T: temp <- GPR[rs] 

T+1: PC <- temp 



Exceptions: 

None 
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LB 



Load Byte 



LB 



31 



26 25 



21 20 



16 15 



LB 
1 00000 


base 


rt 


offset 



16 



Format: 

LB rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the byte at the 
memory location specified by the effective address are sign-extended and 
loaded into general register rt 

Operation: 



vAddr <r- ((offset 15 ) 48 II offset 15>>0 ) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr «- pAddrpsizE - 1 .. 3 " (pAddr 2 ..o xor ReverseEndian 3 ) 
mem <- LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte <r- vAddr 2 ..o xor BigEndianCPU 3 
GPR[rt] *- (mem 7+8 *b yte ) 56 II mem7+8*byte..8*by\& 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LBU 



Load Byte Unsigned 



LBU 





31 26 25 21 20 


16 15 











LBU 
1 001 00 


base 


rt 


offset 






6 5 5 




16 







Format: 

LBU rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the byte at the 
memory location specified by the effective address are zero-extended and 
loaded into general register rt 

Operation: 



T: 



vAddr <r- ((offset 15 ) 48 II offset 15 10 ) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE- 1..3 " (pAddr 2 ..o xor ReverseEndian 3 ) 
mem <- LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte <- vAddr 2 ..o xor BigEndianCPU 3 
GPR[rt] <- 56 II mem 7+8 * byta . 8 * byte 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LD 



Load Doubleword 



LD 



31 26 


25 21 


20 


16 


15 





LD 
110111 


base 


rt 


offset 



16 



Format: 

LD rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the 64-bit 
doubleword at the memory location specified by the effective address are 
loaded into general register rt 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception occurs. 

Operation: 



T: 


vAddr <- ((offset 15 ) 48 II offset 15t0 ) + GPR[base] 




(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 




mem <- LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 




GPR[rt] <r- mem 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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L D CZ Loacl Doubleword To Coprocessor L D CZ 





31 26 25 21 20 


16 15 











LDCz 
1 1 01 xx* 


base 


rt 


offset 






6 5 5 




16 







Format: 

LDCz rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The processor reads a doubleword 
from the addressed memory location and makes the data available to 
coprocessor unit z. The manner in which each coprocessor uses the data 
is defined by the individual coprocessor specifications. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 

This instruction is not valid for use with CPO. 

This instruction is undefined when the least-significant bit of the 
rt field is non-zero. 

Execution of the instruction referencing coprocessor 3 causes a 
reserved instruction exception, not a coprocessor unusable exception. 

NOTE: * See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 

Operation: 



T: 



vAddr <- ((offset 15 ) 48 II offset 15J) ) + GPR[base] 

(pAddr, uncached) <- Addressfranslation (vAddr, DATA) 

mem <- LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 

COPzLD (rt, mem) 



Exceptions: 

TLB refill exception 

TLB invalid exception 

Bus error exception 

Address error exception 

Coprocessor unusable exception 

Reserved instruction exception (coprocessor 3) 

Opcode Bit Encoding: 



LDCz 



Bit #31 


30 


29 


28 


27 


26 





LDC1 


1 


1 





1 





1 




Bit 


#31 30 


29 28 


27 


26 





LDC2 


1 


1 





1 


1 








Opcode 



Coprocessor Unit Number 
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LDL 



Load Doubleword Left 



LDL 





31 26 25 21 20 


1615 











LDL 
011010 


base 


rt 


offset 






6 5 5 




16 







Format: 

LDL rt, offset(base) 

Description: 

This instruction can be used in combination with the LDR instruction 
to load a register with eight consecutive bytes from memory, when the 
bytes cross a doubleword boundary. LDL loads the left portion of the 
register with the appropriate part of the high-order doubleword; LDR loads 
the right portion of the register with the appropriate part of the low-order 
doubleword. 

The LDL instruction adds its sign-extended 16-bit offset to the contents 
of general register base to form a virtual address which can specify an 
arbitrary byte. It reads bytes only from the doubleword in memory which 
contains the specified starting byte. From one to eight bytes will be loaded, 
depending on the starting byte specified. 

Conceptually, it starts at the specified byte in memory and loads that 
byte into the high-order (left-most) byte of the register; then it loads bytes 
from memory into the register until it reaches the low-order byte of the 
doubleword in memory. The least-significant (right-most) byte(s) of the 
register will not be changed. 



address 8 
address 







memory 
(big-endian) 






register 


8 


9 


10 


11 


12 


13 


14 


15 







1 


2 


3 


4 


5 


6 


7 


before ABCDEFGH 



$24 



LDL$24,3($0) 



after 34567FGH $24 



The contents of general register rt are internally bypassed within the 
processor so that no NOP is needed between an immediately preceding 
load instruction which specifies register rt and a following LDL (or LDR) 
instruction which also specifies register rt. 

No address exceptions due to alignment are possible. 
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Operation: 




T: vAddr <- ((offset 15 ) 48 II offset 16 „ ) + GPR[base] 


(pAddr, 


uncached) <- AddressTranslation (vAddr, DATA) 


pAddr <- pAddrp S izE~i..3 " (pAddr 2 . xor ReverseEndian 3 ) 


if BigEndianMem = then 




pAddr <- pAddr PS | ZE-1 3 II 3 


endif 




byte<- 


vAddr 2 .. xor BigEndianCPU 3 


mem<- 


LoadMemory (uncached, byte, pAddr, vAddr, DATA) 


GPR[rt] 


<- mem 7+8 * byte i0 II GPR[rt] 55 _ 8 * byteii o 



Given a doubleword in a register and a doubleword in memory, the 
operation of LDL is as follows: 



LDL 

Register 
Memory 




















A 


B 


C 


D 


E 


F 


G 


H 






















I 


J 


K 


L 


M 


N 





P 























vAddr 2 „ 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


offset 


destination 


type 


offset 


LEM BEM 


LEM BEM 



1 
2 

3 
4 
5 
6 

7 


PBCDEFGH 
OPCDEFGH 
NOPDEFGH 
MNOPEFGP 
L M N OP F GH 
KLMNOPGH 
J KLMNOPH 
I J K L MN OP 



1 
2 
3 
4 
5 
6 
7 


7 
6 
5 
4 
3 
2 
1 



I J K L MNO P 
J KLMNOPH 
KL MNOPGH 
LMNOPFGH 
MNOPEFGH 
NOPDEFGH 
OPCDEFGH 
PBCDEFGH 


7 
6 
5 
4 
3 
2 
1 




1 
2 
3 
4 
5 
6 
7 



LEMLittle-endian memory (BigEndianMem = 0) 
BEMBigEndianMem = 1 

Ti/peAccessType (see Table 2.1 on page 3) sent to memory 
Offsetp Addr 2 „o sent to memory 

Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LDR 



Load Doubleword Right 



LDR 





31 26 25 21 20 


16 15 











LDR 
011011 


base 


rt 


offset 






6 5 5 




16 







Format: 

LDR rt, offset(base) 

Description: 

This instruction can be used in combination with the LDL instruction 
to load a register with eight consecutive bytes from memory, when the 
bytes cross a doubleword boundary. LDR loads the right portion of the 
register with the appropriate part of the low-order doubleword; LDL loads 
the left portion of the register with the appropriate part of the high-order 
doubleword. 

The LDR instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which can 
specify an arbitrary byte. It reads bytes only from the doubleword in 
memory which contains the specified starting byte. From one to eight 
bytes will be loaded, depending on the starting byte specified. 

Conceptually, it starts at the specified byte in memory and loads that 
byte into the low-order (right-most) byte of the register; then it loads bytes 
from memory into the register until it reaches the high-order byte of the 
doubleword in memory. The most significant (left-most) byte(s) of the 
register will not be changed. 



address 8 
address 


memory 
(big-endian) 


register 


8 


9 


10 


11 


12 


13 


14 


15 





1 


2 


3 


4 


5 


6 


7 


before ABCDEFGH$24 


V LDR$2- 


M($0) 

register 


► after A B C 1 2 3 4 $24 





















The contents of general register rt are internally bypassed within the 
processor so that no NOP is needed between an immediately preceding 
load instruction which specifies register rt and a following LDR (or LDL) 
instruction which also specifies register rt 

No address exceptions due to alignment are possible. 
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Operation: 



vAddr <- ((offset! 5 ) 48 II offset 15i0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr«-pAddrpsizE-i..3 " (pAddr 20 xor ReverseEndian 3 ) 

if BigEndianMem = 1 then 

pAddr «- pAddr 31 3 II 3 

endif 

byte <- vAddr 2 ..o xor BigEndianCPU 3 

mem <- LoadMemory (uncached, byte, pAddr, vAddr, DATA) 

GPR[rt] <- GPR[lt] 63 64 _ 8 * byte " mem 63..8*byte 



Given a doubleword in a register and a doubleword in memory, the 
operation of LDR is as follows: 



LDR 

Register 

Memory 




















A 


B 


C 


D 


E 


F 


G 


H 






















I 


J 


K 


L 


M 


N 





P 























vAddr 2 ..o 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


offset 


destination 


type 


offset 


LEM BEM 


LEM BEM 



1 
2 
3 
4 
5 
6 
7 


I J K L MN P 
A I J KL MN 
A B I J K L MN 
A B C i J K L M 
A B C Dl J K L 
A B C DE I J K 
A B C DE F I J 
A B C DE F G I 


7 
6 
5 
4 
3 
2 
1 





1 

2 

3 

4 

5 

6 

7 


A B C D E F G I 
A B C D E F I J 
A B C D E I J K 
A B C D I J K L 
A B C I J K L M 
A B I J K L M N 
A I J K L MN O 
I J K L MNO P 



1 
2 

3 
4 
5 
6 

7 


7 
6 
5 
4 
3 
2 
1 




LEMLittle-endian memory (BigEndianMem = 0) 
BEMBigEndianMem = 1 

TypeAccessType (see Table 2.1 on page 3) sent to memory 
Qffse£pAddr 2 ,.o sent to memory 

Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LH 



Load Halfword 



LH 





31 26 25 21 20 


16 15 











LH 
1 00001 


base 


rt 


offset 






6 5 5 




16 







Format: 

LH rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the halfword at the 
memory location specified by the effective address are sign-extended and 
loaded into general register rt 

If the least-significant bit of the effective address is non-zero, an 
address error exception occurs. 

Operation: 



T: 



vAddr <- ((offset^) 48 II offset 15>0 ) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE- 1..3 " (pAddr 2 ..o xor (ReverseEndian II 0)) 
mem <- LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 
byte <- vAddr 2i . xor (BigEndianCPU 2 II 0) 
GPR[rt] <- (mem 15+8 * byte ) 16 II mem 15+8 * byte<8 * byte 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LHU 



Load Halfword Unsigned 



LHU 





31 26 25 21 20 


16 15 











LHU 
1 001 01 


base 


rt 


offset 






6 5 5 




16 







Format: 

LHU rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the halfword at the 
memory location specified by the effective address are zero-extended and 
loaded into general register rt. 

If the least-significant bit of the effective address is non-zero, an 
address error exception occurs. 

Operation: 



vAddr <r- ((offset 15 ) 48 II offset 15 >0 ) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr <r- pAddrpsizE - 1..3 " (pAddr 2 ..o xor (ReverseEndian 2 II 0)) 
mem <r- LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 
byte <- vAddr 2i xor (BigEndianCPU 2 II 0) 
GPR[rt] <-0 48 II mem 15+8 * byte> . 8 * byte 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus Error exception 
Address error exception 
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LL 



Load Linked 



LL 





31 26 25 21 20 


16 15 











LL 
1 1 0000 


base 


rt 


offset 






6 5 5 




16 







Format: 

LL rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the word at the 
memory location specified by the effective address are loaded into general 
register rt The loaded word is sign-extended. 

This instruction implicitly performs a SYNC operation; all loads and 
stores to shared memory fetched prior to the LL must access memory 
before the LL, and loads and stores to shared memory fetched subsequent 
to the LL must access memory after the LL. The processor begins checking 
the accessed word for modification by other processors and devices. 

Load Linked and Store Conditional can be used to atomically update 
memory locations as shown: 



LI: 




LL 


Tl, (TO) 


ADD 


T2, Tl, 1 


SC 


T2, (TO) 


BEQ 


T2, O, Ll 


NOP 





This atomically increments the word addressed by TO. Changing the 
ADD to an OR changes this to an atomic bit set. 

This instruction is available in User mode, and it is not necessary for 
CPO to be enabled. 

The operation of LL is undefined if the addressed location is uncached 
and, for synchronization between multiple processors, the operation of LL 
is undefined if the addressed location is noncoherent. A cache miss that 
occurs between LL and SC may cause SC to fail, so no load or store 
operation should occur between LL and SC, otherwise the SC may never 
be successful. Exceptions also cause SC to fail, so persistent exceptions 
must be avoided. 

If either of the two least-significant bits of the effective address are non- 
zero, an address error exception takes place. 
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Operation: 



vAddr <r- ((offset 15 ) 48 II offset 15 . #0 ) + GPR[base] 
(pAddr, uncached) *- Addressf ranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-1.,3 " (pAddr 2t>0 xor (ReverseEndian II 2 )) 
mem <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte <r- vAddr 2 >0 xor (BigEndianCPU II 2 ) 
GPR[rt] <r- (mem 31+8 * byte ) 32 II mem 31+8 * byte 8 * byte 
LLbit <- 1 
SyncOperationQ 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LLD 



Load Linked Doubleword 



LLD 





31 26 25 21 20 


16 15 











LLD 
110100 


base 


rt 


offset 






6 5 5 




16 







Format: 

LLD rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the doubleword at 
the memory location specified by the effective address are loaded into 
general register rt. 

This instruction implicitly performs a SYNC operation; all loads and 
stores to shared memory fetched prior to the LLD must access memory 
before the LLD, and loads and stores to shared memory fetched 
subsequent to the LLD must access memory after the LLD. The processor 
begins checking the accessed doubleword for modification by other 
processors and devices. 

Load Linked Doubleword and Store Conditional Doubleword can be 
used to atomically update memory locations: 



Ll: 




LLD 


Tl, (TO) 


ADD 


T2, Tl, 1 


SCD 


T2, (TO) 


BEQ 


T2, 0, Ll 


NOP 





This atomically increments the word addressed by TO. Changing the 
ADD to an OR changes this to an atomic bit set. 

The operation of LLD is undefined if the addressed location is 
uncached and, for synchronization between multiple processors, the 
operation of LLD is undefined if the addressed location is noncoherent. A 
cache miss that occurs between LLD and SCD may cause SCD to fail, so 
no load or store operation should occur between LLD and SCD, otherwise 
the SCD may never be successful. Exceptions also cause SCD to fail, so 
persistent exceptions must be avoided. 

This instruction is available in User mode, and it is not necessary for 
CPO to be enabled. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 
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Operation: 



vAddr <- ((offset 15 ) 4B II offset 15 >0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
mem <- LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
GPR[rt] <r- mem 
Libit <- 1 
SyncOperationQ 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LUI 



Load Upper Immediate 



LUI 



31 



26 25 



21 20 



16 15 



LUI 
001 1 1 1 



00000 


it 


immediate 



16 



Format: 

LUI rt, immediate 

Description: 

The 16-bit immediate is shifted left 16 bits and concatenated to 16 bits 
of zeros. The result is placed into general register r£. The loaded word is 
sign-extended. 

Operation: 



T: GPR[rt] <- (immediate 15 ) 32 II immediate II 16 



Exceptions: 

None 
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LW 



Load Word 



LW 





31 26 25 21 20 


16 15 











LW 
1 0001 1 


base 


rt 


offset 






6 5 5 




16 







Format: 

LW rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the word at the 
memory location specified by the effective address are loaded into general 
register rt The loaded word is sign-extended. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception occurs. 

Operation: 



vAddr<- ((offset 15 ) 48 II offset 16i . ) + GPR[base] 
(pAddr, uncached) <r- AddressTranslation (vAddr, DATA) 
pAddr <r- pAddrpsizE^ 3 II (pAddr 2 10 xor (ReverseEndian II 2 )) 
mem <r- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte <- vAddr 2 „ xor (BigEndianCPU II 2 ) 
GPR[rt] <- (mem 31+8 * byte ) 32 II mem 31+8 * byte 8 * byte 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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L WCZ Load Word To Coprocessor L WCZ 



31 



26 25 



21 20 



16 15 



LWCz 
1 1 OOxx* 


base 


rt 


offset 



16 



Format: 

LWCz rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The processor reads a word from 
the addressed memory location, and makes the data available to 
coprocessor unit z. 

The manner in which each coprocessor uses the data is defined by the 
individual coprocessor specifications. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception occurs. 

This instruction is not valid for use with CPO. 
NOTE: *See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 

Operation: 



T: 



vAddr <- ((offset 5 ) 48 II offset 15i . ) + GPR[base} 

(pAddr, uncached)*- Addressfranslation (vAddr, DATA) 

pAddr*- pAddrpsizE-1 3 " (pAddr 2 O xor (ReverseEndian II 2 )) 

mem <- LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 

byte <r- vAddr 2 xor (BigEndianCPU II 2 ) 

COPzLW (byte! rt, mem) 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
Coprocessor unusable exception 

Opcode Bit Encoding: 



LWCz 



Bit #31 


30 


29 


28 


27 


26 





LWC1 


1 


1 











1 




Bit 


#31 30 


29 28 


27 


26 





LWC2 


1 


1 








1 








Opcode 



Coprocessor Unit Number 
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LWL 



Load Word Left 



LWL 





31 26 25 21 20 


16 15 











LWL 
1 0001 


base 


it 


offset 






6 5 5 




16 







Format: 

LWL rt, offset(base) 

Description: 

This instruction can be used in combination with the LWR instruction 
to load a register with four consecutive bytes from memory, when the bytes 
cross a word boundary. LWL loads the left portion of the register with the 
appropriate part of the high-order word; LWR loads the right portion of the 
register with the appropriate part of the low-order word. 

The LWL instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which can 
specify an arbitrary byte. It reads bytes only from the word in memory 
which contains the specified starting byte. From one to four bytes will be 
loaded, depending on the starting byte specified. The loaded word is sign- 
extended. 

Conceptually, it starts at the specified byte in memory and loads that 
byte into the high-order (left-most) byte of the register; then it loads bytes 
from memory into the register until it reaches the low-order byte of the 
word in memory. The least-significant (right-most) byte(s) of the register 
will not be changed. 





memory 
(big-endian) 




register 




address 4 
address 


4 5 


6 


7 


before 


ABC 


D $24 


| 1 


2 


3 








V LWL$ 


124,1 ($0) 


"""""" ^» after 


| 1 2 3 


D $24 

















The contents of general register rt are internally bypassed within the 
processor so that no NOP is needed between an immediately preceding 
load instruction which specifies register rt and a following LWL (or LWR) 
instruction which also specifies register rt. 

No address exceptions due to alignment are possible. 
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Operation: 



"T- vAddr <- ((offset 15 ) 48 II offset 15 10 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <r- pAddrpsizE-i..3 " (pAddr 2 ..o xor ReverseEndian 3 ) 
if BigEndianMem = then 

pAddr <- pAddr PS izE-i..3 " ° 3 
end if 

byte «- vAdd^ xor BigEndianCPU 2 
word <- vAddr 2 xor BigEndianCPU 
mem <r- LoadMemory (uncached, II byte, pAddr, vAddr, DATA) 

temp<- mem 31+3 2* W ord-8*byte..32*word " GPR [ rt l23-8*byte..O 
GPR[rt] <r- (temp 31 ) 32 II temp 



Given a doubleword in a register and a doubleword in memory, the 
operation of LWL is as follows: 



LWL 

Register 

Memory 




















A 


B 


C 


D 


E 


F 


G 


H 






















I 


J 


K 


L 


M 


N 





P 























vAddr 2 ..o 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


offset 


destination 


type 


offset 


LEM BEM 


LEM BEM 



1 
2 
3 
4 
5 
6 
7 


SSSSPFGH 
S SSSOPGH 
S SSSNOPH 
SSSSMNOP 
SSSSLFGH 
SSSSKLGH 
SSSSJ KLH 
S S S S I J K L 



1 
2 
3 

1 
2 
3 


7 
6 
5 
4 
4 3 
4 2 
4 1 
4 


S S S S I J K L 
SSSSJKL H 
SSSSKLGH 
SSSSLFGH 
SSSSMNOP 
SSSSNOPH 
SSSSOPGH 
SSSSPFGH 


3 
2 
1 

3 
2 
1 



4 
4 1 
4 2 
4 3 
4 
5 
6 
7 



Key to table: 

LEMLittle-endian memory (BigEndianMem = 0) 

BEMBigEndianMem = 1 

TlypeAccessType (see Table 2.1 on page 3) sent to memory 

Q#se£pAddr 2 ..o sent t° memory 
Ssign-extend of destination 31 

Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LWR 



Load Word Right 



LWR 





31 26 25 21 20 


16 15 











LWR 
1 001 1 


base 


rt 


offset 






6 5 5 




16 







Format: 

LWR rt, offset(base) 

Description: 

This instruction can be used in combination with the LWL instruction 
to load a register with four consecutive bytes from memory, when the bytes 
cross a word boundary. LWR loads the right portion of the register with 
the appropriate part of the low-order word; LWL loads the left portion of 
the register with the appropriate part of the high-order word. 

The LWR instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which can 
specify an arbitrary byte. It reads bytes only from the word in memory 
which contains the specified starting byte. From one to four bytes will be 
loaded, depending on the starting byte specified. The loaded word is sign- 
extended. 

Conceptually, it starts at the specified byte in memory and loads that 
byte into the low-order (right-most) byte of the register; then it loads bytes 
from memory into the register until it reaches the high-order byte of the 
word in memory. The most significant (left-most) byte(s) of the register will 
not be changed. 



address 4 
address 


memory 
(big-endian) 




register 




$24 


4.1 5 


6 


7 


before 


ABC 


D 


°\ 1 


2 


3 






\v LWR $: 


24,4($0) 


— — ^- after 


ABC 


4 















The contents of general register rt are internally bypassed within the 
processor so that no NOP is needed between an immediately preceding 
load instruction which specifies register rt and a following LWR (or LWL) 
instruction which also specifies register rt. 

No address exceptions due to alignment are possible. 
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Operation: 



T: vAddr <- ((offset 15 ) 48 II offset 15 ..o) + GPR[base] 

(pAddr, uncached) <- Addressf ranslation (vAddr, DATA) 
pAddr <- pAddr PS |ZE-i..3 " (pAddr 2t xor ReverseEndian 3 ) 
if BigEndianMem = 1 then 

pAddr <r- pAddr PS izE-3i..3 " ° 3 
endif 

byte <- vAddr-j xor BigEndianCPU 2 
word <- vAddr 2 xor BigEndianCPU 
mem <- LoadMemory (uncached, II byte, pAddr, vAddr, DATA) 

temp <- GPR[rt] 31>< 32r 8 * b y t6ii0 II mern3i+32*word-32*word+8*byte 
GPR[rt] <- (temps^^ 2 II temp 



Given a word in a register and a word in memory, the operation of LWR 
is as follows: 



LWR 


Register A 


B 


C 


D 


E 


F 


G 


H 




Memory I 


J 


K 


L 


M 


N 





P 





vAddr 2 ..o 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


offset 


destination 




offset 


LEM BEM 


type 


LEM BEM 



1 
2 
3 
4 
5 
6 
7 


S SSSMNOP 
S S S S E MN 
SSSSEFMN 
S SSSEFGM 
S S S S I J K L 
S S S S E I J K 
S S S S E F I J 
S S S S E F G I 



1 
2 
3 

1 
2 
3 


4 

1 4 

2 4 

3 4 

4 

5 

6 

7 


S S S S E F G I 
S S S S E F I J 
S S S S E I J K 
S S S S I J K L 
SS SSEFGM 
SSSSEFMN 
S S S S E MN 
SS SSMNOP 



1 
2 
3 

1 
2 
3 


7 
6 
5 
4 
3 4 
2 4 
1 4 
4 



Key to table: 

LEMLittle-endian memory (BigEndianMem = 0) 

BEMBigEndianMem = 1 

TVpeAccessType (see Table 2. 1 on page 3) sent to memory 

OffsetpAddr 2 ..o sent to memory 

Ssign-extend of destination 31 

Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LWU 



Load Word Unsigned 



LWU 





31 26 25 21 20 


16 15 











LWU 
101111 


base 


rt 


offset 






6 5 5 




16 







Format: 

LWU rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the word at the 
memory location specified by the effective address are loaded into general 
register rt The loaded word is zero-extended. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception occurs. 

Operation: 



vAddr<- ((offset^) 48 II offset 15i0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr <- pAddrpsizE-i..3 " (pAddr 2 ., xor (ReverseEndian II 2 )) 

mem <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 

byte <- vAddr 2 xor (BigEndianCPU II 2 ) 
^32" 



GPR[rt] <r- 3Z II mem 31+8 * byte t8 * byte 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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Move From 



MrvAJ System Control Coprocessor MrOU 



31 



26 25 21 20 16 15 11 10 



COP0 
01 0000 


MF 
00000 


rt 


rd 



000 0000 0000 



11 



Format: 

MFC0 it, rd 

Description: 

The contents of coprocessor register rd of the CP0 are loaded into 
general register rt. May be used on both 32-bit and 64-bit CP0 registers. 

Operation: 



T: data <- CPR[0,rd] 

T+1: GPR[rt] <- (data 31 ) 32 II data 31<t0 



Exceptions: 

Coprocessor unusable exception 
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|^| PQ2 Move From Coprocessor 



MFCz 



31 



26 25 



21 20 



16 15 



11 10 



COPz 
01 x x* 


MF 
00000 


rt 


rd 



000 0000 0000 



11 



Note: *See the table "Opcode Bit Encoding" on next page, or "CPU 
Instruction Opcode Bit Encoding" at the end of Appendix A. 

Format: 

MFCz rt, rd 

Description: 

The contents of coprocessor register rd of coprocessor z are loaded into 
general register rt. 

Execution of the instruction referencing coprocessor 3 causes a 
reserved instruction exception, not a coprocessor unusable exception. 



Operation: 








T: 


if rd = 


Othen 








data <r- CPR[z,rd 4# . 


1 I' 0] 31 .. 




else 










data <- CPR[z,rd 4i , 


1 " °l63..32 




endif 






T+1: 


GPR[rt] 


<- (data 31 ) 32 II data 





Exceptions: 

Coprocessor unusable exception 

Reserved instruction exception (coprocessor 3) 

Opcode Bit Encoding: 



MFCz 



Opcode j Coprocessor Suboperation 

Coprocessor Unit Number 



' Bit # 31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 





MFC0 





1 































Bit 


#31 


30 


29 


28 


27 


26 


25 


24 


23 


22 21 





MFC1 





1 











1 



















Bit 


#31 


30 


29 


28 


27 


26 


25 


24 


23 


22 21 





MFC2 





1 








1 























A-90 



CPU Instruction Set Details 



Appendix A 



MFHI 



Move From HI 



MFHI 



31 



26 25 



16 15 



11 10 



6 5 



SPECIAL 
000000 



00 0000 0000 


rd 



00000 


MFHI 
01 0000 



10 



Format: 

MFHI rd 

Description: 

The contents of special register HI are loaded into general register rd. 

To ensure proper operation in the event of interruptions, the two 
instructions which follow a MFHI instruction may not be any of the 
instructions which modify the HI register: MULT, MULTU, DIV, DIVU, 
MTHI, DMULT, DMULTU, DDIV, DDIVU. 



Operation: 



GPR[rd] <- HI 



Exceptions: 

None 
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MFLO 



Move From Lo 



MFLO 



31 



26 25 



16 15 



11 10 



6 5 



SPECIAL 
000000 



00 0000 0000 


rd 



00000 


MFLO 
01 001 



6 



10 



Format: 

MFLO rd 

Description: 

The contents of special register LO are loaded into general register rd. 

To ensure proper operation in the event of interruptions, the two 
instructions which follow a MFLO instruction may not be any of the 
instructions which modify the LO register: MULT, MULTU, DIV, DIVU, 
MTLO, DMULT, DMULTU, DDIV, DDIVU. 



Operation: 



T: 



GPR[rd] <- LO 



Exceptions: 

None 
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Move To 



M T CO System Control Coprocessor M T CO 



31 



26 25 



21 20 



16 15 



11 10 



COP0 
01 0000 


MT 
001 00 


rt 


rd 



000 0000 00 00 



11 



Format: 

MTCO rt, rd 

Description: 

The contents of general register rt are loaded into coprocessor register 
rd of CP0. 

Because the state of the virtual address translation system may be 
altered by this instruction, the operation of load instructions, store 
instructions, and TLB operations immediately prior to and after this 
instruction are undefined. 



Operation: 



T: data«-GPR[rt] 

T+1: CPR[0,rd] <- data 



Exceptions: 

Coprocessor unusable exception 
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MTCz 


Move To Coprocessor MTCZ 




31 26 25 21 20 16 15 11 10 




COPz 
1 x x* 


MT 
001 00 


rt 


rd 



000 0000 0000 




6 5 5 5 11 



Format: 

MTCz rt, rd 

Description: 

The contents of general register rt are loaded into coprocessor register 
rd of coprocessor z. Execution of the instruction referencing coprocessor 
3 causes a reserved instruction exception, not a coprocessor unusable 
exception. 



Operation: 



T: data<-GPR[rt] 31 
T+1: if rd = 

CPR[z,rd 4t1 II 0] <- CPR[z, rd 4i1 II 0] 63 132 II data 
gIsg 

CPR[z,rd 4t1 II 0] <- data II CPRfcrd^.-, II 0] 31<0 
endif 



Exceptions: 

Coprocessor unusable exception 

Reserved instruction exception (coprocessor 3) 

♦Opcode Bit Encoding: 



MTCz Bi1 

COP0 


#31 30 29 28 27 26 25 24 


23 22 21 







1 




















1 












Bit 


#31 30 29 28 27 26 25 24 


23 22 21 




C0P1 





1 











1 








1 












Bit 


#31 30 29 28 27 26 25 24 


23 22 21 




C0P2 





1 








1 











1 














^ J 


^ J 




Ccf 


^> 


jboperatic 






Opcode Copro 


cessoi 


■ Unit r> 


lumber 


)rocessor Si 


>n 
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MTHI 



Move To HI 



MTHI 



31 



26 25 



21 20 



65 



SPECIAL 
000000 


rs 



000 000000000000 


MTHI 
01 0001 



6 



15 



6 



Format: 

MTHI rs 

Description: 

The contents of general register rs are loaded into special register HI. 

If a MTHI operation is executed following a MULT, MULTU, DIV, or 
DIVU instruction, but before any MFLO, MFHI, MTLO, or MTHI 
instructions, the contents of special register LO are undefined. 

Operation: 



T-2: 


HI 


<- undefined 


T-1: 


HI 


<- undefined 


T: 


HI 


<- GPR[rs] 



Exceptions: 

None 
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MTLO 



Move To LO 



MTLO 



31 



26 25 21 20 



65 



SPECIAL 
000000 


rs 



000000000000000 


MTLO 
01 001 1 



15 



Format: 

MTLO rs 

Description: 

The contents of general register rs are loaded into special register LO. 

If a MTLO operation is executed following a MULT, MULTU, DIV, or 
DIVU instruction, but before any MFLO, MFHI, MTLO, or MTHI 
instructions, the contents of special register HI are undefined. 

Operation: 



T-2: 


LO 


<- undefined 


T-1: 


LO 


<- undefined 


T: 


LO 


<- GPR[rs] 



Exceptions: 

None 



A- 96 



CPU Instruction Set Details 



Appendix A 



MULT 



Multiply 



MULT 





31 26 25 21 20 


16 15 6 5 






SPECIAL 
000000 


rs 


rt 



00 0000 0000 


MULT 
01 1 000 






6 5 5 


10 6 






Format: 

MULT rs, rt 













Description: 

The contents of general registers rs and rt are multiplied, treating both 
operands as 32-bit 2's complement values. No integer overflow exception 
occurs under any circumstances. The operands must be valid 32-bit, sign- 
extended values. 

When the operation completes, the low-order word of the double result 
is loaded into special register LO, and the high-order word of the double 
result is loaded into special register HI 

If either of the two preceding instructions is MFHI or MFLO, the results 
of these instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by a minimum of two other instructions. 

Operation: 



T-2: LO 


<- undefined 


HI 


<- undefined 


T-1: LO 


«- undefined 


HI 


<- undefined 


T: t 


<- GPRfrs] 31 ..o * GPR[rt] 31 . 
<-(t3i)2"lt 31 ..o 


LO 


HI 


<- (t63) 'I W. 32 



Exceptions: 

None 
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MULTU 



Multiply Unsigned 



MULTU 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
000000 


rs 


rt 



00 0000 0000 


MULTU 
011001 



10 



Format: 

MULTU rs, rt 

Description: 

The contents of general register rs and the contents of general register 
rt are multiplied, treating both operands as unsigned values. No overflow 
exception occurs under any circumstances. The operands must be valid 
32-bit, sign-extended values. 

When the operation completes, the low-order word of the double result 
is loaded into special register LO, and the high-order word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results 
of these instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by a minimum of two instructions. 

Operation: 



T-2: LO 


<- undefined 


HI 


<- undefined 


T-1: LO 


<- undefined 


HI 


<- undefined 


T: t 


«-(OIIGPR[rs]3 1 ..o)*(OIIGPR[rt] 31 ..o) 


LO 


<" (t3l)2 I' *81..0 


HI 


<- (t63) I' *63..32 



Exceptions: 

None 
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NOR 



Nor 



NOR 





31 26 25 




21 20 


16 15 


11 10 6 


5 






SPECIAL 
000000 


rs 


rt 


rd 



00000 


NOR 
1 001 1 1 






6 


5 


5 


5 


5 


6 





Format: 

NOR rd, rs, rt 

Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical NOR operation. The result is placed 
into general register rd. 

Operation: 



GPR[rd] <- GPR[rs] nor GPR[rt] 



Exceptions: 

None 



A-99 



CPU Instruction Set Details 



Appendix A 



OR 



Or 



OR 



31 



26 25 21 20 16 15 11 10 6 5 



SPECIAL 
000000 


rs 


rt 


rd 



00000 


OR 
1 001 01 



6 



Format: 

OR rd, rs, rt 



Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical OR operation. The result is placed 
into general register rd. 

Operation: 



GPR[rd] <- GPR[rs] or GPR[rt] 



Exceptions: 

None 
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ORI 



Or Immediate 



ORI 





31 26 25 




21 20 


16 15 











ORI 
001 1 01 


rs 


it 


immediate 






6 


5 


5 




16 







Format: 

ORI rt, rs, immediate 

Description: 

The 16-bit immediate is zero-extended and combined with the contents 
of general register rs in a bit-wise logical OR operation. The result is placed 
into general register rt 

Operation: 



T: GPR[rt] <- GPR[rs] 6al6 II (immediate or GPR[rs] 15 <0 ) 



Exceptions: 

None 
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SB 



Store Byte 



SB 



31 



26 25 



21 20 



16 15 



SB 
1 01 000 


base 


rt 


offset 



16 



Format: 

SB rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The least-significant byte of 
register rt is stored at the effective address. 

Operation: 



T: vAddr <- ((offset^) 48 II offset 15>0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-i..3 " (pAddr 2 , xor ReverseEndian 3 ) 
byte <- vAddr 2 xor BigEndianCPU 3 

data <- GPR[rt] 63 _ 6 * byte .. II rb V te 

StoreMemory (uncached, BYTE, data, pAddr, vAddr, DATA) 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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sc 



Store Conditional 



SC 





31 26 25 21 20 


16 15 











SC 
1 1 1 000 


base 


rt 


offset 






6 5 5 




16 







Format: 

SC rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of general register rt 
are conditionally stored at the memory location specified by the effective 
address. 

This instruction implicitly performs a SYNC operation; loads and 
stores to shared memory fetched prior to the SC must access memory 
before the SC; loads and stores to shared memory fetched subsequent to 
the SC must access memory after the SC. 

If any other processor or device has modified the physical address 
since the time of the previous Load Linked instruction, or if an ERET 
instruction occurs between the Load Linked instruction and this store 
instruction, the store fails and is inhibited from taking place. 

The success or failure of the store operation (as defined above) is 
indicated by the contents of general register rt after execution of the 
instruction. A successful store sets the contents of general register rt to 1 ; 
an unsuccessful store sets it to 0. 

The operation of Store Conditional is undefined when the address is 
different from the address used in the last Load Linked. 

This instruction is available in User mode; it is not necessary for CPO 
to be enabled. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception takes place. 

If this instruction should both fail and take an exception, the exception 
takes precedence. 

Operation: 



T: 



vAddr <- ((offset 15 ) 48 II offset 15> ) + GPR[base] 
(pAddr, uncached) <- Addressf translation (vAddr, DATA) 
pAddr <- pAddrpsizE^ 3 II (pAddr? xor (ReverseEndian II 2 )) 
data <- GPR[rt] 63 . 8 * byte ..o II O 8 *^" 
if LLbit then 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 
endif 

GPR[rt]<- 63 II LLbit 
SyncOperationQ 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SCD 



Store Conditional Doubleword 



SCD 





31 26 25 21 20 


16 15 











SCD 
111100 


base 


rt 


offset 






6 5 5 




16 







Format: 

SCD rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of general register rt 
are conditionally stored at the memory location specified by the effective 
address. 

This instruction implicitly performs a SYNC operation; loads and 
stores to shared memory fetched prior to the SCD must access memory 
before the SCD; loads and stores to shared memory fetched subsequent to 
the SCD must access memory after the SCD. 

If any other processor or device has modified the physical address 
since the time of the previous Load Linked Doubleword instruction, or if 
an ERET instruction occurs between the Load Linked Doubleword 
instruction and this store instruction, the store fails and is inhibited from 
taking place. 

The success or failure of the store operation (as defined above) is 
indicated by the contents of general register rt after execution of the 
instruction. A successful store sets the contents of general register rt to 1 ; 
an unsuccessful store sets it to 0. 

The operation of Store Conditional Doubleword is undefined when the 
address is different from the address used in the last Load Linked 
Doubleword. 

This instruction is available in User mode; it is not necessary for CPO 
to be enabled. 

If either of the three least-significant bits of the effective address is 
non-zero, an address error exception takes place. 

If this instruction should both fail and take an exception, the exception 
takes precedence. 

Operation: 



offset 15> . ) + GPR[base] 
Addressfranslation (vAddr, DATA) 



vAddr<-((offset 15 ) 48 l 
(pAddr, uncached) 
data <-GPR[rt] 
if Libit then 

StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 
endif 

GPR[rt]<- 63 II LLbit 
SyncOperationQ 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SD 



Store Doubleword 



SD 



31 



26 25 



21 20 



16 15 



SD 
111111 


base 


rt 


offset 



16 



Format: 

SD rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of general register rt 
are stored at the memory location specified by the effective address. 

If either of the three least-significant bits of the effective address are 
non-zero, an address error exception occurs. 

Operation: 



T: vAddr <- ((offset 15 ) 48 II offset 15 . >0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

data <r- GPR[rt] 

StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SDCz 



Store Doubleword 
From Coprocessor 



SDCz 



31 



26 25 



21 20 



16 15 



SDCz 
1 1 1 1 X X* 


base 


rt 


offset 



16 



Format: 

SDCz rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. Coprocessor unit z sources a 
doubleword, which the processor writes to the addressed memory location. 
The data to be stored is defined by individual coprocessor specifications. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 

This instruction is not valid for use with CPO. 

This instruction is undefined when the least-significant bit of the rt 
field is non-zero. 

Operation: 



I I offset 15 ..o) + GPR[base] 
AddressTranslation (vAddr, DATA) 



vAddr<-((offset 15 ) 48 

(pAddr, uncached) <- 

data*- COPzSD(rt), 

StoreMemoiy (uncached, DOUBLEWORD, data, pAddr, 

vAddr, DATA) 



Note: *See the table in this section under "Opcode Bit Encoding." 
Also see "CPU Instruction Opcode Bit Encoding" at the end of Appendix A. 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
Coprocessor unusable exception 

Opcode Bit Encoding: 



SDCZ Bit #31 30 29 28 27 26 







SDC1 


1 


1 


1 


1 





1 






Bit 


#31 30 29 28 27 26 







SDC2 


1 


1 


1 


1 


1 











^ ^ 










SD opcode C 


toproc 


essor 


Unit Number 
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SDL 



Store Doubleword Left 



SDL 





31 26 25 21 20 


16 15 











SDL 
101100 


base 


rt 


offset 






6 5 5 




16 







Format: 

SDL rt, offset(base) 

Description: 

This instruction can be used with the SDR instruction to store the 
contents of a register into eight consecutive bytes of memory, when the 
bytes cross a doubleword boundary. SDL stores the left portion of the 
register into the appropriate part of the high-order doubleword of memory; 
SDR stores the right portion of the register into the appropriate part of the 
low-order doubleword. 

The SDL instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which may 
specify an arbitrary byte. It alters only the word in memory which contains 
that byte. From one to four bytes will be stored, depending on the starting 
byte specified. 

Conceptually, it starts at the most-significant byte of the register and 
copies it to the specified byte in memory; then it copies bytes from register 
to memory until it reaches the low-order byte of the word in memory. 

No address exceptions due to alignment are possible. 



address 8 
address 



address 8 
address 



memory 
(big-endian) 



8 


9 


10 


11 


12 


13 


14 


15 





1 


2 


3 


4 


5 


6 


7 




8 


9 


10 


11 


12 


13 


14 


15 





B 


C 


D 


E 


F 


G 


H 



register 



before 



A 


B 


C 


D 


E 


F 


G 


H 



$24 



SDL $24,1 ($0) 



after 



Operation: 



vAddr <- ((offset! 5 ) 48 II offset 15 >0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr<- pAddrpsizE^ 3 II (pAddr 2ti0 xor ReverseEndian 3 ) 

If BigEndianMem = then 

pAddr <r- pAddr 31 3 II 3 
endif 

byte <r- vAddr 2< o xor BigEndianCPU 3 
data <- o 56 - 8 *^* II GPR[rt] 63 .. 56 ^* byte 
Storememory (uncached, byte, data, pAddr, vAddr, DATA) 
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Given a doubleword in a register and a doubleword in memory, the 
operation of SDL is as follows: 



SDL 

Register 

Memory 




















A 


B 


C 


D 


E 


F 


G 


H 






















1 


J 


K 


L 


M 


N 


O 


P 























vAddr 2 ..o 


BigEndianCPU = 




BigEndianCPU = 


1 


destination 


type 


offset 


destination 


type 


offset 


LEM 


BEM 


LEM BEM 





I J K L M N OA 








7 


A B 


C D E F G H 


7 





1 


I J K L M N A B 


1 





6 


I A 


B C D E F G 


6 


1 


2 


I JKLMABC 


2 





5 


I J 


ABODE F 


5 


2 


3 


I J K L A B C D 


3 





4 


I J 


K A B CD E 


4 


3 


4 


I J K AB C D E 


4 





3 


I J 


K L A B C D 


3 


4 


5 


I J A BC D E F 


5 





2 


I J 


K L MA B C 


2 


5 


6 


I A B CD E F G 


6 





1 


I J 


K L MN A B 


1 


6 


7 


ABCDEFGH 


7 








I J 


K L MN A 





7 



LEM Little-endian memory (BigEndianMem = 0) 

BEM BigEndianMem = 1 

Type AccessType (see Table 2. 1 on page 2-3) sent to memory 

Offset pAddr 2 ..o sent to memory 

Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SDR 



Store Doubleword Right 



SDR 





31 26 25 21 20 


16 15 











SDR 
101101 


base 


rt 


offset 






6 5 5 




16 







Format: 

SDR rt, offset(base) 

Description: 

This instruction can be used with the SDL instruction to store the 
contents of a register into eight consecutive bytes of memory, when the 
bytes cross a boundary between two doublewords. SDR stores the right 
portion of the register into the appropriate part of the low-order 
doubleword; SDL stores the left portion of the register into the appropriate 
part of the low-order doubleword of memory. 

The SDR instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which may 
specify an arbitrary byte. It alters only the word in memory which contains 
that byte. From one to eight bytes will be stored, depending on the starting 
byte specified. 

Conceptually, it starts at the least-significant (rightmost) byte of the 
register and copies it to the specified byte in memory; then it copies bytes 
from register to memory until it reaches the high-order byte of the word in 
memory. No address exceptions due to alignment are possible. 









memory 
(big-endian] 


I 




address 8 


8 


9 


10 


11 


12 


13 


14 


15 


address 





1 


2 


3 


4 


5 


6 


7 



address 8 
address 



memory 
(big-endian) 



8 


9 


10 


11112 


13 


14 


15 


E 


F 


G 


H 4 


5 


6 


7 



register 



before 



ABCDEFGH 



$24 



SDR $24,4($0) 



after- 



Operation: 



T: vAddr <- ((offset 15 ) 48 II offset 15t . ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr<- pAddrpsizE-1.,3 " (pAddr 2 ..o xor 
ReverseEndian 3 ) 

If BigEndianMem = then 

pAddr <r- pAddr PS | ZE _ 31 3 II 3 
endif 

byte <- vAddr 1 xor BigEndianCPU 3 
data^GPR[rt]63-8*bytell0 8 * byte 



Given a doubleword in a register and a doubleword in memory, the 
operation of SDR is as follows: 
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SDR 

Register 
Memory 




















A 


B 


C 


D 


E 


F 


G 


H 






















1 


J 


K 


L 


M 


N 





P 























vAddr 2 ..o 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


offset 


destination 


type 


offset 


LEM BEM 


LEM BEM 



1 
2 
3 
4 
5 
6 
7 


ABCDEFGH 
B CDEFGHP 
CDEFGHOP 
DEFGHNOP 
E FGHMNOP 
F GHLMNOP 
GHKLMNOP 
HJ KLMNOP 


7 
6 
5 
4 
3 
2 
1 





1 

2 

3 

4 

5 

6 

7 


H J KLMNOP 
GHKLMNOP 
F G H L MN P 
E F G H MN O P 
DEFGHNOP 
CDEFGHOP 
BCDEFGHP 
ABCDEFGH 



1 
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7 


7 
6 
5 
4 
3 
2 
1 




LEM Little-endian memory (BigEndianMem = 0) 

BEM BigEndianMem = 1 

Type AccessType (see Table 2. 1 on page 2-3) sent to memory 

Offset pAddr 2## o sent to memory 

Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SH 



Store Halfword 



SH 





31 26 25 21 20 


16 15 











SH 
1 01 001 


base 


rt 


offset 






6 5 5 




16 







Format: 

SH rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. The least-significant 
halfword of register rt is stored at the effective address. If the least- 
significant bit of the effective address is non-zero, an address error 
exception occurs. 

Operation: 



vAddr<- ((offset 15 ) 48 II offset 15<0 ) + GPR[base] 
(pAddr, uncached) «- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrp S | ZE-1 3 II (pAddr 2> xor (ReverseEndian 2 II 0)) 
byte <- vAddr 2 xor (BigEndianCPU 2 II 0) 
data <- GPR[rt] 63 ^ byte>> o II 8 * b v te 
StoreMemory (uncached, HALFWORD, data, pAddr, vAddr, DATA) 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SLL 



Shift Left Logical 



SLL 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 



00000 


rt 


rd 


sa 


SLL 
000000 



Format: 

SLL rd, rt, sa 

Description: 

The contents of general register rt are shifted left by sa bits, inserting 
zeros into the low-order bits. 

The result is placed in register rd. 

The operand must be a valid sign-extended, 32-bit value. 

Operation: 



T: 


s«-0llsa 

temp <r- GPR[rt] 31 . s-0 II s 

GPR[rd] <- (tenr^) 32 II temp 



Exceptions: 

None 
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SLLV Shift Left Logical Variable SLLV 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


rt 


rd 



00000 


SLLV 
0001 00 



Format: 

SLLV rd, rt, rs 

Description: 

The contents of general register rt are shifted left the number of bits 
specified by the low-order five bits contained in general register rs 9 
inserting zeros into the low-order bits. 

The result is placed in register rd. 

The operand must be a valid sign-extended, 32-bit value. 

Operation: 



s<-0IIGP[rs] 4tt0 
temp<-GPR[rt] (31 . s) „oHO s 
GPR[rd] <- (temp 31 ) 32 II temp 



Exceptions: 

None 
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SLT 



Set On Less Than 



SLT 





31 26 25 




21 20 


16 15 


11 10 6 5 






SPECIAL 
000000 


rs 


rt 


rd 



00000 


SLT 
101010 






6 


5 


5 


5 


5 6 





Format: 

SLT rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs. Considering both quantities as signed integers, if the 
contents of general register rs are less than the contents of general register 
rt, the result is set to one; otherwise the result is set to zero. 

The result is placed Into general register rd. 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the comparison 
overflows. 



Operation: 



T: if GPR[rs] < GPR[rt] then 
GPR[rd] <- 63 I1 1 

6lS6 

GPR[rd] <- 64 
endif 



Exceptions: 

None 
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SLTI 



Set On Less Than Immediate 



SLTI 





31 26 25 




21 20 


16 15 











SLTI 
001 01 


rs 


rt 


immediate 






6 


5 


5 




16 







Format: 

SLTI rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and subtracted from the 
contents of general register rs. Considering both quantities as signed 
integers, if rs is less than the sign-extended immediate, the result is set to 
one; otherwise the result is set to zero. 

The result is placed into general register rt. 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the comparison 
overflows. 

Operation: 




Exceptions: 

None 
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SLTIU 



Set On Less Than 
Immediate Unsigned 



SLTIU 





31 26 25 




21 20 


16 15 











SLTIU 
001 01 1 


rs 


rt 


immediate 






6 


5 


5 




16 







Format: 

SLTIU rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and subtracted from the 
contents of general register rs. Considering both quantities as unsigned 
integers, if rs is less than the sign-extended immediate, the result is set to 
one; otherwise the result is set to zero. 

The result is placed into general register rt. 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the comparison 
overflows. 

Operation: 



if (0 II GPR[rsl) < I I (immediate^ 4 * 5 I I immediate 15 then 

GPR[rd] <- 63 I I 1 
else 

GPR[rdl <- 64 
endif 



Exceptions: 

None 
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SLTU Set 0n Less Than Unsigned SLTU 





31 26 25 




21 20 


16 15 


11 10 6 


5 






SPECIAL 
000000 


rs 


rt 


rd 



00000 


SLTU 
101011 






6 


5 


5 


5 


5 


6 





Format: 

SLTU rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs. Considering both quantities as unsigned integers, if 
the contents of general register rs are less than the contents of general 
register rt, the result is set to one; otherwise the result is set to zero. 

The result is placed into general register rd. 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the comparison 
overflows. 

Operation: 



if (0 I I GPRlrsl) < I I GPR[rt] then 

GPR[rd] <- 63 I I 1 
else 

GPR[rd] <- 64 
endif 



Exceptions: 

None 
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SRA 



Shift Right Arithmetic 



SRA 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 



00000 


it 


rd 


sa 


SRA 
00001 1 



Format: 

SRA rd, rt, sa 

Description: 

The contents of general register rt are shifted right by sa bits, sign- 
extending the high-order bits. 

The result is placed in register rd. 

The operand must be a valid sign-extended, 32-bit value. 

Operation: 



T: 


s<-0 II sa 




temp^GPRlrtbO'llGPRIrtlaL.. 




GPR[rd] <- (temp 31 ) 32 II temp 



Exceptions: 

None 
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SRAV 



Shift Right 
Arithmetic Variable 



SRAV 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


rt 


rd 



00000 


SRAV 
0001 1 1 



Format: 

SRAV rd, rt, rs 

Description: 

The contents of general register rt are shifted right by the number of 
bits specified by the low-order five bits of general register rs, sign- 
extending the high-order bits. 

The result is placed in register rd. 

The operand must be a valid sign-extended, 32-bit value. 



Operation: 



s <- GPR[rs] 4 >0 

temp <- (GPR[rt] 31 ) s II GPR[rt] 31< 

GPR[rd] <- (temp 31 ) 32 II temp 



Exceptions: 

None 
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SRL 



Shift Right Logical 



SRL 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 



00000 


rt 


rd 


sa 


SRL 
00001 



Format: 

SRL rd, rt, sa 

Description: 

The contents of general register rt axe shifted right by sa bits, inserting 
zeros into the high-order bits. 

The result is placed in register rd. 

The operand must be a valid sign-extended, 32-bit value. 

Operation: 



T: 


s<- Oil sa 

temp<-0 s IIGPR[rt] 31 . s 
GPR[rd] <- (temp 31 ) 32 II temp 



Exceptions: 

None 
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S R L V Shift Right Logical Variable S R L V 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


rt 


id 



00000 


SRLV 
0001 1 



6 



6 



Format: 

SRLV rd, rt, rs 

Description: 

The contents of general register rt axe shifted right by the number of 
bits specified by the low-order five bits of general register rs, inserting zeros 
into the high-order bits. 

The result is placed in register rd. 

The operand must be a valid sign-extended, 32-bit value. 

Operation: 



T: s <- GPR[rs] 4 . 

temp<-0 s IIGPR[rt] 31iS 
GPR[rd] <- (temp 31 ) 32 II temp 



Exceptions: 

None 
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SUB 



Subtract 



SUB 





31 26 25 




21 20 


16 15 


11 10 6 
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SPECIAL 
000000 


rs 


rt 


rd 



00000 


SUB 
1 0001 






6 


5 


5 


5 


5 


6 





Format: 

SUB rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs to form a result. The result is placed into general 
register rd. The operands must be valid sign-extended, 32-bit values. 

The only difference between this instruction and the SUBU instruction 
is that SUBU never traps on overflow. 

An integer overflow exception takes place if the carries out of bits 30 
and 31 differ (2's complement overflow). The destination register rd is not 
modified when an integer overflow exception occurs. 

Operation: 



temp <- GPR[rs] - GPR[rt] 
GPR[rd] <- (temp 31 ) 32 II temp 31 -0 



Exceptions: 

Integer overflow exception 
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SUBU 



Subtract Unsigned 



SUBU 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


rt 


rd 



00000 


SUBU 
1 0001 1 



Format: 

SUBU rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs to form a result. 

The result is placed into general register rd. 

The operands must be valid sign-extended, 32-bit values. 

The only difference between this instruction and the SUB instruction 
is that SUBU never traps on overflow. No integer overflow exception occurs 
under any circumstances. 

Operation: 



T: temp <- GPR[rs] - GPR[rt] 

GPR[rd] <- (temp 31 ) 32 II temp 31 „ 



Exceptions: 

None 
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sw 



Store Word 



SW 





31 26 25 21 20 


16 15 











SW 
101011 


base 


it 


offset 






6 5 5 




16 







Format: 

SW it, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of general register rt 
are stored at the memory location specified by the effective address. 

If either of the two least-significant bits of the effective address are non- 
zero, an address error exception occurs. 

Operation: 



vAddr «- ((offset 15 ) 48 II offset 15>>0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr f- pAddrpsizE-1.,3 " (pAddr 2 ..o xor (ReverseEndian II 2 ) 

byte <- vAddr 2 xor (BigEndianCPU II 2 ) 

dataf-GPR[rtW byte N0 8 * b y te 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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S WCZ Store Word From Coprocessor S WCZ 





31 26 25 21 20 


16 15 











SWCz 
1 1 1 x x* 


base 


rt 


offset 






6 5 5 




16 







Format: 

SWCz rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. Coprocessor unit z sources a word, 
which the processor writes to the addressed memory location. 

The data to be stored is defined by individual coprocessor 
specifications. 

This instruction is not valid for use with CPO. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception occurs. 

Execution of the instruction referencing coprocessor 3 causes a 
reserved instruction exception, not a coprocessor unusable exception. 

Operation: 



T: vAddr <- ((offset 15 ) 48 II offset 15 mm0 ) + GPR[base] 

(pAddr, uncached) «- AddressTranslation (vAddr, DATA) 

pAddr<- pAddrpsizE-i..3 " (pAddr 2t xor (ReverseEndian II 2 ) 

byte <r- vAddr 2 xor (BigEndianCPU II 2 ) 

data <r- COPzSW (byte,rt) 

StoreMemory (uncached, WORD, data, pAddr, vAddr DATA) 



Note: *See the table in this section under "Opcode Bit Encoding." 
Also see "CPU Instruction Opcode Bit Encoding" at the end of Appendix A. 

Exceptions: 

TLB refill exception 

TLB invalid exception 

TLB modification exception 

Bus error exception 

Address error exception 

Coprocessor unusable exception 

Reserved instruction exception (coprocessor 3) 

Opcode Bit Encoding: 



SWCZ Bit #31 30 29 28 27 26 







SWC1 


1 


1 


1 








1 






Bit 


#31 30 29 28 27 26 







SWC2 


1 


1 


1 





1 











^ y 










SW opcode 


Coprc 


>cesso 


r Unit Number 
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SWL 



Store Word Left 



SWL 





31 26 25 21 20 


16 15 











SWL 
101010 


base 


rt 


offset 






6 5 5 




16 







Format: 

SWL rt, offset(base) 

Description: 

This instruction can be used with the SWR instruction to store the 
contents of a register into four consecutive bytes of memory, when the 
bytes cross a word boundary. SWL stores the left portion of the register 
into the appropriate part of the high-order word of memory; SWR stores the 
right portion of the register into the appropriate part of the low-order word. 

The SWL instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which may 
specify an arbitrary byte. It alters only the word in memory which contains 
that byte. From one to four bytes will be stored, depending on the starting 
byte specified. 

Conceptually, it starts at the most-significant byte of the register and 
copies it to the specified byte in memory; then it copies bytes from register 
to memory until it reaches the low-order byte of the word in memory. 

No address exceptions due to alignment are possible. 



address 4 
address 



address 4 
address 



memory 
(big-endian) 



4 


5 


6 


7 





1 


2 


3 




4 


5 


6 


7 





A 


B 


C 



register 



before 



A 


B 


C 


D 



$24 



SWL$24,1($0) 



after 



Operation: 



T: 



vAddr <- ((offset 15 ) 48 II offset 16 >0 ) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr*- pAddr PS izE-i..3 " (pAddr 2i0 xor ReverseEndian 3 ) 
If BigEndianMem = then 

pAddr <r- pAddr 31 2 II 2 
endif 

byte^-vAdd^ xor BigEndianCPU 2 
if (vAddr 2 xor BigEndianCPU) = then 

data <- 32 II o 24 " 8 ^ 6 II GPR[rt] 31<(24 _ 8 * byte 
else 

data<~0 24 - 8 * b y te IIGPR[rt] 3 
endif 
StoreMemory(uncached, byte, data, pAddr, vAddr, DATA) 



J31..24-8*byte 



HO' 



32 
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Given a doubleword in a register and a doubleword in memory, the 
operation of SWL is as follows: 



SWL 

Register 
Memory 




















A 


B 


C 


D 


E 


F 


G 


H 






















1 


J 


K 


L 


M 


N 





P 























vAddr 2 ..o 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


offset 


destination 


type 


offset 


LEM BEM 


LEM BEM 



1 
2 
3 
4 
5 
6 
7 


I J K L MN OE 
I J K L MN E F 
I J K L M E F G 
I J K L E F GH 
I J K EMN OP 
I J E F MN OP 
I E F GMN OP 
E FGHMNOP 



1 
2 
3 

1 
2 
3 


7 
6 
5 
4 
4 3 
4 2 
4 1 
4 


EF GHMNOP 
I E F G MNO P 
I J E F MNO P 
I J K E MNO P 
I J K L E F G H 
I J K L ME F G 
I J K L MN E F 
I J K L MNO E 


3 
2 
1 

3 
2 
1 



4 
4 1 
4 2 
4 3 
4 
5 
6 
7 



LEM Little-endian memory IBigEndianMem = 0) 

BEM BigEndianMem = 1 

Type AccessType (see Table 2.1 on page 2-3) sent to memory 

Offset pAddr 2 ..o sent to memory 

Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SWR 



Store Word Right 



SWR 





31 26 25 21 20 


16 15 











SWR 
101110 


base 


rt 


offset 






6 5 5 




16 







Format: 

SWR rt, offset(base) 

Description: 

This instruction can be used with the SWL instruction to store the 
contents of a register into four consecutive bytes of memory, when the 
bytes cross a boundary between two words. SWR stores the right portion 
of the register into the appropriate part of the low-order word; SWL stores 
the left portion of the register into the appropriate part of the low-order 
word of memory. 

The SWR instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which may 
specify an arbitrary byte. It alters only the word in memory which contains 
that byte. From one to four bytes will be stored, depending on the starting 
byte specified. 

Conceptually, it starts at the least-significant (rightmost) byte of the 
register and copies it to the specified byte in memory; then copies bytes 
from register to memory until it reaches the high-order byte of the word in 
memory. 

No address exceptions due to alignment are possible. 



address 4 
address 



address 4 
address 



memory 
(big-endian) 



4 


5 


6 


7 





1 


2 


3 




D 


5 


6 


7 





1 


2 


3 



register 



before 



A 


B 


C 


D 



$24 



SWR $24,1($0) 



after 
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Operation: 



T: vAddr <- ((offset 16 ) 48 II offset 15 <0 ) + GPR[base] 

(pAddr, uncached) f- AddressTranslation (vAddr, DATA) 
pAddr<- pAddrpsizE-i..3 " (pAddr 2 ..o xor ReverseEndian 3 ) 
If BigEndianMem = then 

pAddr <r- pAddr 31 2 II 2 
endif 

byte^vAdd^ xor BigEndianCPU 2 
if (vAddr 2 xor BigEndianCPU) = then 

data <- 32 II GPR[rt] 31 . 8 * byte .. II 8 * b y te 
else 

data <- GPR[rt] 3 i-8*byte..o » 8 * byte II 32 
endif 
StoreMemory(uncached, WORD-byte, data, pAddr, vAddr, DATA) 



Given a doubleword in a register and a doubleword in memoiy, the 
operation of SWR is as follows: 



SWR 

Register 

Memory 


















A 


B 


C 


D 


E 


F 


G 


H 


















I 


J 


K 


L 


M 


N 


O 


P 



















vAddr 2 ..o 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


offset 


destination 


type 


offset 


LEM BEM 


LEM BEM 



1 
2 
3 
4 
5 
6 
7 


I J K L E F GH 
I J K L F G H P 
I J K L G H OP 
I J K L H N OP 
E FGHMNOP 
F GHLMNOP 
GHKLMNOP 
HJ KLMNOP 


3 
2 
1 

3 
2 
1 



4 

1 4 

2 4 

3 4 

4 

5 

6 

7 


H J KLMNOP 
GHKLMNOP 
F G H L MNO P 
E F G H MNO P 
I J K L H NO P 
I J K L GHO P 
I J K L F GH P 
I J K L E F G H 



1 
2 
3 

1 
2 
3 


7 
6 
5 
4 
3 4 
2 4 
1 4 
4 



LEM Little-endian memory (BigEndianMem = 0) 

BEM BigEndianMem = 1 

Type AccessType (see Table 2. 1 on page 2-3) sent to memoiy 

Offset pAddr 2 ..o sent to memory 

Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SYNC 



Synchronize 



SYNC 



31 



26 25 



6 5 



SPECIAL 
000000 



0000 0000 0000 0000 0000 


SYNC 
001 1 1 1 



20 



Format: 

SYNC 

Description: 

The SYNC instruction ensures that any loads and stores fetched prior 
to the present instruction are completed before any loads or stores after 
this instruction are allowed to start. Use of the SYNC instruction to 
serialize certain memoiy references may be required in a multiprocessor 
environment for proper synchronization. For example: 



Processor A 


Processor B 


SW R1.DATA 


1: LW 


R2, FLAG 


LI R2, 1 


BEQ 


R2, R0.1B 


SYNC 


NOP 




SW R2, FLAG 


SYNC 






LW 


R1.DATA 



The SYNC in processor A prevents DATA being written after FLAG, 
which could cause processor B to read stale data. The SYNC in processor 
B prevents DATA from being read before FLAG, which could likewise result 
in reading stale data. For processors which only execute loads and stores 
in order, with respect to shared memory, this instruction is a NOP. 

LL and SC instructions implicitly perform a SYNC. 

This instruction is allowed in User mode. 



Operation: 



T: SyncOperationQ 



Exceptions: 

None 
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SYSCALL System Call SYSCALL 



31 



26 25 



6 5 



SPECIAL 
000000 


Code 


SYSCALL 
1 1 00 



20 



Format: 

SYSCALL 

Description: 

A system call exception occurs, immediately and unconditionally 
transferring control to the exception handler. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



SystemCallException 



Exceptions: 

System Call exception 
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TEQ 



Trap If Equal 



TEQ 





31 26 25 




21 20 


16 15 




6 5 






SPECIAL 
000000 


rs 


rt 


code 


TEQ 
110100 






6 


5 


5 




10 


6 





Format: 

TEQ rs, rt 

Description: 

The contents of general register rt are compared to general register rs. 
If the contents of general register rs are equal to the contents of general 
register i% a trap exception occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



ifGPR[rs] = GPR[rt]then 

TrapException 
endif 



Exceptions: 

Trap exception 
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TEQI 



Trap If Equal Immediate 



TEQI 
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REGIMM 
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TEQI 
01 1 00 


immediate 




6 


5 


5 


16 



Format: 

TEQI rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents 
of general register rs. If the contents of general register rs are equal to the 
sign-extended immediate, a trap exception occurs. 

Operation: 



T: if GPR[rs] = (immediate 15 ) 48 II immediate-, 50 then 
TrapException 
endif 



Exceptions: 

Trap exception 
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TGE 



Trap If Greater Than Or Equal 



TGE 
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SPECIAL 
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rt 


code 
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5 


5 




10 


6 





Format: 

TGE rs, rt 

Description: 

The contents of general register rt are compared to the contents of 
general register rs. Considering both quantities as signed integers, if the 
contents of general register rs are greater than or equal to the contents of 
general register rt a trap exception occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



ifGPR[rs] > GPR[rt]then 

TrapException 
endif 



Exceptions: 

Trap exception 
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TQ F I Trap If Greater Than Or Equal Immediate ["Q £ | 
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TGEI 
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Format: 

TGEI rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents 
of general register rs. Considering both quantities as signed integers, if 
the contents of general register rs are greater than or equal to the sign- 
extended immediate, a trap exception occurs. 

Operation: 



T: if GPR[rs] > (immediate 15 ) 48 II immediate 15 then 
TrapException 
endif 



Exceptions: 

Trap exception 
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TGEIU 



Trap If Greater Than Or Equal 
Immediate Unsigned 
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TGEIU 
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5 


16 



Format: 

TGEIU rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents 
of general register rs. Considering both quantities as unsigned integers, if 
the contents of general register rs are greater than or equal to the sign- 
extended immediate, a trap exception occurs. 

Operation: 



T: if (0 II GPR[rs]) > (0 II (immediate 15 ) 48 II immediate 15 ) then 
TrapException 
endif 



Exceptions: 

Trap exception 
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Format: 

TGEU rs, rt 

Description: 

The contents of general register rt are compared to the contents of 
general register rs. Considering both quantities as unsigned integers, if 
the contents of general register rs are greater than or equal to the contents 
of general register rU a trap exception occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



T: if (0 II GPR[rs]) > (0 II GPR[rt]) then 

TrapException 
endif 



Exceptions: 

Trap exception 
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TLBP Probe TLB For Matching Entry TLB P 



31 



26 25 24 



6 5 



COPO 
01 0000 


CO 

1 



000 0000 0000 0000 0000 


TLBP 
001 000 



6 



19 



6 



Format: 

TLBP 

Description: 

The Index register is loaded with the address of the TLB entry whose 
contents match the contents of the EntryHi register. If no TLB entry 
matches, the high-order bit of the Index register is set. 

The architecture does not specify the operation of memory references 
associated with the instruction immediately after a TLBP instruction, nor 
is the operation specified if more than one TLB entry matches. 

Operation: 



T: lndex<- 1 II 31 






for i in 0..TLBEntries-1 




if(TLB[i] 167 .. 


141 and not (0 15 


HTLB[i] 216 .. 20 5)) 


= EntryHi 39 . 


-13 ) and not (0 15 


II TLB[i] 216 ..2o 5 )) and 


(TLB[i] 140 or 


(TLB[i] 135 .. 128 = 


EntryHiy )) then 


Index <- 


O^Hi^o 




endif 






endfor 







Exceptions: 

Coprocessor unusable exception 
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TLBR 



Read Indexed TLB Entry 



TLBR 





31 26 25 24 


6 5 






COPO 
01 0000 


CO 
1 



000 0000 0000 0000 0000 


TLBR 
000001 






6 1 19 


6 





Format: 

TLBR 

Description: 

The Gbit (which controls ASID matching) read from the TLB is written 
into both of the EntryLoO and EntryLol registers. 

The EntryHi and EntryLo registers are loaded with the contents of the 
TLB entry pointed at by the contents of the TLB Index register. The 
operation is invalid (and the results are unspecified) if the contents of the 
TLB Index register are greater than the number of TLB entries in the 
processor. 

Operation: 



T: PageMask*- TLB[lndex 5>0 ]255..i92 

EntryHi «- TLB[lndex 5i0 ]i9i.. 128 and not TLB[lndex 5 .. ]255..192 
EntryLol <-TLB[lndex 5 „ ]i27..65 " TLB[lndex 5 t0 ]i40 
EntryLoO <- TLB[lndex 5 ] 63 A II TLB[lndex 5 ] 140 



Exceptions: 

Coprocessor unusable exception 
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TLBWI Write |ndexed TLB Entf y TLB Wl 





31 26 25 24 


6 5 






COPO 
01 0000 


CO 

1 



000 0000 0000 0000 0000 


TLBWI 
00001 






6 1 19 


6 





Format: 

TLBWI 



Description: 

The G bit of the TLB is written with the logical AND of the G bits in the 
EntryLoO and EntryLol registers. 

The TLB entry pointed at by the contents of the TLB Index register is 
loaded with the contents of the EntryHi and EntryLo registers. 

The operation is invalid (and the results are unspecified) if the contents 
of the TLB Index register are greater than the number of TLB entries in the 
processor. 

Operation: 



TLB[lndex 5<0 ] <~ 

PageMask II (EntryHi and not PageMask) II EntryLol II EntryLoO 



Exceptions: 

Coprocessor unusable exception 
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TLBWR Write Random TLB Entry TLBWR 



31 



26 25 24 



6 5 



COPO 
01 0000 


CO 

1 



000 0000 0000 0000 0000 


TLBWR 
0001 1 



19 



Format: 

TLBWR 

Description: 

The G bit of the TLB is written with the logical AND of the G bits in the 
EntryLoO and EntryLol registers. 

The TLB entry pointed at by the contents of the TLB Random register 
is loaded with the contents of the EntryHi and EntryLo registers. 

Operation: 



T: TLB[Random 5 ] <- 

PageMask II (EntryHi and not PageMask) II EntryLol II EntryLoO 



Exceptions: 

Coprocessor unusable exception 
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TLT 



Trap If Less Than 



TLT 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
000000 


rs 


rt 


code 


TLT 
110010 



10 



Format: 

TLT rs, rt 



Description: 

The contents of general register rt are compared to general register rs. 
Considering both quantities as signed integers, if the contents of general 
register rs are less than the contents of general register rt a trap exception 
occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



T: if GPR[rs] < GPR[rt] then 
TrapException 
endif 



Exceptions: 

Trap exception 
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TLTI 



Trap If Less Than Immediate 



TLTI 





31 26 25 




21 20 16 15 


( 


) 




REGIMM 
000001 


rs 


TLTI 
01 01 


immediate 




6 


5 


5 


16 



Format: 

TLTI rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents 
of general register rs. Considering both quantities as signed Integers, if the 
contents of general register rs are less than the sign-extended immediate, 
a trap exception occurs. 

Operation: 



T: if GPR[rs] < (immediate 15 ) 48 II immediate 15 then 
TrapException 
endif 



Exceptions: 

Trap exception 
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"P L"|T| II Trap If Less Than Immediate Unsigned XLXI U 





31 26 25 




21 20 16 15 


C 


) 




REGIMM 
000001 


rs 


TLTIU 
01011 


immediate 




6 


5 


5 


16 



Format: 

TLTIU rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents 
of general register rs. Considering both quantities as signed integers, if the 
contents of general register rs axe less than the sign-extended immediate, 
a trap exception occurs. 

Operation: 



T: if (0 II GPR[rs]) < (0 II (immediate! 5 ) 48 II immediate 15 ) then 
TrapException 
endif 



Exceptions: 

Trap exception 
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J" L"f JJ Trap If Less Than Unsigned TLTU 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
000000 


rs 


rt 


code 


TLTU 
1 1 001 1 



6 



10 



Format: 

TLTU rs, rt 

Description: 

The contents of general register rt are compared to general register rs. 
Considering both quantities as unsigned integers, if the contents of general 
register rs are less than the contents of general register rt a trap exception 
occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



if (0 II GPR[rs]) < (0 II GPR[rt]) then 

TrapException 
endif 



Exceptions: 

Trap exception 
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TNE 



Trap If Not Equal 



TNE 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
000000 


rs 


rt 


code 


TNE 
110110 



10 



Format: 

TNE rs, rt 

Description: 

The contents of general register rt are compared to general register rs. 
If the contents of general register rs are not equal to the contents of general 
register rt a trap exception occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



T: ifGPR[rs]*GPR[rt]then 
TrapException 
endif 



Exceptions: 

Trap exception 
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TNEI 



Trap If Not Equal Immediate 



TNEI 





31 26 25 




21 20 16 15 


C 


) 




REGIMM 
000001 


rs 


TNEI 
01110 


immediate 




6 


5 


5 


16 



Format: 

TNEI rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents 
of general register rs. If the contents of general register rs are not equal to 
the sign-extended immediate, a trap exception occurs. 

Operation: 



T: if GPR[rs] * (immediate 15 ) 48 II immediate 15 then 
TrapException 
endif 



Exceptions: 

Trap exception 
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WAIT 



Wait 



WAIT 





31 26 25 24 


6 5 






COPO 
01 0000 


CO 
1 



000 0000 0000 0000 0000 


WAIT 
1 00000 






6 1 19 


6 





Format: 

WAIT 

Description: 

The WAIT instruction is used to halt the internal pipeline and thus 
reduce the power consumption of the CPU, See Appendix G for more 
details. 



Operation: 



T: if SysAD bus is idle then 
StopPipeline 
endif 



Exceptions: 

Coprocessor unusable exception 
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XOR 



Exclusive Or 



XOR 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


rt 


rcl 



00000 


XOR 
1 001 1 



Format: 

XOR rd, rs, rt 



Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical exclusive OR operation. 
The result is placed into general register rd. 

Operation: 



GPR[rd] <- GPR[rs] xor GPRfrt] 



Exceptions: 

None 
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XORI 



Exclusive OR Immediate 



XORI 





31 26 25 




21 20 


16 15 











XORI 
001 1 1 


rs 


It 


immediate 






6 


5 


5 




16 







Format: 

XORI rt, rs, immediate 

Description: 

The 16-bit immediate is zero-extended and combined with the contents 
of general register rs in a bit-wise logical exclusive OR operation. 
The result is placed into general register rt. 

Operation: 



GPR[rt] <- GPR[rs] xor (0 48 II immediate) 



Exceptions: 

None 
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CPU Instruction Opcode Bit Encoding 

The remainder of this Appendix presents the opcode bit encoding for 
the CPU instruction set (ISA and extensions), as implemented by the 
R4600/R4700. 

Table A. 4 lists the R4600/R4700 Opcode Bit Encoding. 
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31. .29 


1 

2 

3 

4 

5 

6 

7 



5..3 


1 

2 

3 

4 

5 

6 

7 



20..19 


1 

2 

3 



25,24 

1 
2 
3 

20..19 



28..26 



1 


2 


Opcode 

3 4 


5 


6 


7 


SPECIAL 


REGIMM 


J 


JAL 


BEQ 


BNE 


BLEZ 


BGTZ 


ADDI 


ADDIU 


SLTI 


SLTIU 


ANDI 


ORI 


XORI 


LUI 


COPO 


COP1 


COP2 


* 


BEQL 


BNEL 


BLEZL 


BGTZL 


DADDI 


DADDIU 


LDL 


LDR 


* 


* 


* 


* 


LB 


LH 


LWL 


LW 


LBU 


LHU 


LWR 


LWU 


SB 


SH 


SWL 


SW 


SDL 


SDR 


SWR 


CACHES 


LL 


LWC1 


LWC2 


* 


LLD 


LDC1 


LDC2 


LD 


SC 


SWC1 


SWC2 


• 


SCD 


SDC1 


SDC2 


SD 



2..0 



1 


SPECIAL function 

2 3 4 


5 


6 


7 


SLL 


* 


SRL 


SRA 


SLLV 


* 


SRLV 


SRAV 


JR 


JALR 


* 


* 


SYSCALL 


BREAK 


* 


SYNC 


MFHI 


MTHI 


MFLO 


MTLO 


DSLLV 


* 


DSRLV 


DSRAV 


MULT 


MULTU 


DIV 


DIVU 


DMULT 


DMULTU 


DDIV 


DDIVU 


ADD 


ADDU 


SUB 


SUBU 


AND 


OR 


XOR 


NOR 


* 


* 


SLT 


SLTU 


DADD 


DADDU 


DSUB 


DSUBU 


TGE 


TGEU 


TLT 


TLTU 


TEQ 


* 


TNE 


* 


DSLL 


• 


DSRL 


DSRA 


DSLL32 


* 


DSRL32 


DSRA32 



18 A .16 
6 


1 


2 


REGIMM it 

3 4 


5 


6 


7 


BLTZ 


BGEZ 


BLTZL 


BGEZL 


* 


* 


* 


* 


TGEI 


TGEIU 


TLTI 


TLTIU 


TEQI 


* 


TNEI 


* 


BLTZAL 


BGEZAL 


BLTZALL 


BGEZALL 


* 


* 


* 


* 


* 


* 


* 


• 


* 


* 


* 


* 



23..21 




COPz rs 

3 4 



MF 


DMF | 


CF 


y I mt 


DMT | 


CT | 


y 


BC 


y 


y 


y y 


y 


Y 


y 


CO 



18.. 16 



1 


2 


COPz rt 

3 4 


5 


6 


7 


BCF 


BCT 


| BCFL 


BCTL 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 


y 



2..0 




CPO Function 

3 4 



* 


TLBR 


ITLBWI | 


♦ 


♦ 


* 


ITLBWR | 


♦ 


TLBP 


♦ 


4> 


4> 


♦ 


♦ 


♦ 


* 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


ERET 


♦ 


* 


♦ 


♦ 


* 


♦ 


♦ 


WAIT 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 


♦ 



5..3 


1 

2 

3 

4 

5 

6 

7 



Key to Table: 

* Operation codes marked with an asterisk cause reserved instruction exceptions in all current 

implementations and are reserved for future versions of the architecture, 
g Operation codes marked with a gamma cause a reserved instruction exception. They are 

reserved for future versions of the architecture, 
d Operation codes marked with a delta are valid only for R4600 processors with CPO enabled, 

and cause a reserved instruction exception on other processors, 
f Operation codes marked with a phi are invalid but do not cause reserved instruction 

exceptions in R4600 implementations. 

Table A.4 
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Introduction 

This appendix provides a detailed description of each floating-point 
unit (FPU) instruction (refer to Appendix A for a detailed description of the 
CPU instructions). The instructions are listed alphabetically, and any 
exceptions that may occur due to the execution of each instruction are 
listed after the description of each instruction. Descriptions of the 
immediate causes and the manner of handling exceptions are omitted from 
the instruction descriptions in this appendix (refer to Chapter 7 for 
detailed descriptions of floating-point exceptions and handling). 

Figure B.3 on page B-45 lists the entire bit encoding for the constant 
fields of the floating-point instruction set; the bit encoding for each 
instruction is included with that individual instruction. 

Instruction Formats 

There are three basic instruction format types: 

• I-Type, or Immediate instructions, which include load and store oper- 
ations 

• M-Type, or Move instructions 

• R-Type, or Register instructions, which include the two- and three- 
register floating-point operations. 

The instruction description subsections that follow show how these 
three basic instruction formats are used by: 

• Load and store instructions 

• Move instructions 

• Floating-Point computational instructions 

• Floating-Point branch instructions 

Floating-point instructions are mapped onto the MIPS coprocessor 
instructions, defining coprocessor unit number one (CP1) as the floating- 
point unit. 

Each operation is valid only for certain formats. Implementations may 
support some of these formats and operations through emulation, but they 
only need to support combinations that are valid (marked Vin Table B. 1). 

Combinations marked R in Table B. 1 are not currently specified by this 
architecture, and cause an unimplemented instruction trap. They will be 
available for future extensions to the architecture. 
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Operation 


Source Format 


Single 


Double 


Word 


Longword 


ADD 


V 


V 


R 


R 


SUB 


V 


V 


R 


R 


MUL 


V 


V 


R 


R 


DIV 


V 


V 


R 


R 


SQRT 


V 


V 


R 


R 


ABS 


V 


V 


R 


R 


MOV 


V 


V 






NEG 


V 


V 


R 


R 


TRUNC.L 


V 


V 






ROUND.L 


V 


V 






CEIL.L 


V 


V 






FLOOR.L 


V 


V 






TRUNC.W 


V 


V 






ROUND.W 


V 


V 






CEIL.W 


V 


V 






FLOOR.W 


V 


V 






CVT.S 




V 


V 


V 


CVT.D 


V 




V 


V 


CVT.W 


V 


V 






CVT.L 


V 


V 






C 


V 


V 


R 


R 



Table B.l Valid FPU Instruction Formats 

The coprocessor branch on condition true/false instructions can be 
used to logically negate any predicate. Thus, the 32 possible conditions 
require only 16 distinct comparisons, as shown in Table B.2 below. 
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Condition 


Relations 


Invalid 
Operation 
Exception If 
Unordered 


Mnemonic 


Code 


Greater 
Than 


Less 

Than 


Equal 


Unordered 


True 


False 


F 


T 





F 


F 


F 


F 


No 


UN 


OR 


1 


F 


F 


F 


T 


No 


EQ 


NEQ 


2 


F 


F 


T 


F 


No 


UEQ 


OGL 


3 


F 


F 


T 


T 


No 


OLT 


UGE 


4 


F 


T 


F 


F 


No 


ULT 


OGE 


5 


F 


T 


F 


T 


No 


OLE 


UGT 


6 


F 


T 


T 


F 


No 


ULE 


OGT 


7 


F 


T 


T 


T 


No 


SF 


ST 


8 


F 


F 


F 


F 


Yes 


NGLE 


GLE 


9 


F 


F 


F 


T 


Yes 


SEQ 


SNE 


10 


F 


F 


T 


F 


Yes 


NGL 


GL 


11 


F 


F 


T 


T 


Yes 


LT 


NLT 


12 


F 


T 


F 


F 


Yes 


NGE 


GE 


13 


F 


T 


F 


T 


Yes 


LE 


NLE 


14 


F 


T 


T 


F 


Yes 


NGT 


GT 


15 


F 


T 


T 


T 


Yes 



Table B.2 Logical Negation of Predicates by Condition True/False 

Floating-Point Loads, Stores, and Moves 

All movement of data between the floating-point coprocessor and 
memory is accomplished by coprocessor load and store operations, which 
reference the floating-point coprocessor General Purpose registers. These 
operations are unformatted; no format conversions are performed and, 
therefore, no floating-point exceptions can occur due to these operations. 

Data may also be directly moved between the floating-point 
coprocessor and the processor by move to coprocessor and move from 
coprocessor instructions. Like the floating-point load and store operations, 
move to/from operations perform no format conversions and never cause 
floating-point exceptions. 

An additional pair of coprocessor registers are available, called 
Floating-Point Control registers for which the only data movement 
operations supported are moves to and from processor General Purpose 
registers. 
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Floating-Point Operations 

The floating-point unit operation set includes: 

• floating-point add 

• floating-point subtract 

• floating-point multiply 

• floating-point divide 

• floating-point square root 

• convert between fixed-point and floating-point formats 

• convert between floating-point formats 

• floating-point compare 

These operations satisfy the requirements of IEEE Standard 754 
requirements for accuracy. Specifically, these operations obtain a result 
which is identical to an infinite-precision result rounded to the specified 
format, using the current rounding mode. 

Instructions must specify the format of their operands. Except for 
conversion functions, mixed-format operations are not provided. 

Instruction Notation Conventions 

In this appendix, all variable subfields in an instruction format (such 
as/s, ft, immediate, and so on) are shown in lower-case. The instruction 
name (such as ADD, SUB, and so on) is shown in upper-case. 

For the sake of clarity, we sometimes use an alias for available subfield 
in the formats of specific instructions. For example, we use rs = base in 
the format for load and store instructions. Such an alias is always lower 
case, since it refers to a variable subfield. 

In some instructions, the instruction subfields op and junction can 
have constant 6-bit values. When reference is made to these instructions, 
upper-case mnemonics are used. For instance, in the floating-point ADD 
instruction we use op = COP1 and function = FADD. In other cases, a 
single field has both fixed and variable subfields, so the name contains 
both upper and lower case characters. Bit encoding for mnemonics are 
shown in Figure B.3 at the end of this appendix, and are also included with 
each individual instruction. 

In the instruction description examples that follow, the Operation 
section describes the operation performed by each instruction using a 
high-level language notation. 

Instruction Notation Examples 

The following examples illustrate the application of some of the 
instruction notation conventions: 



Example #1 : 

Sixteen zero bits are concatenated with an immediate 
value (typically 1 6 bits), and the 32-bit string (with the lower 
1 6 bits set to zero) is assigned to General Purpose Register rt. 



Example #2: 

(immediate^* 6 fHrnmecfiate^ ^ 

Bit 1 5 (the sign bit) of an immediate value is extended for 
16 bit positions, and the result is concatenated with bits 15 
through of the immediate value to form a 32-bit sign 
extended value. 
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Load and Store Instructions 

In the R4600 implementation, the instruction immediately following a 
load may use the contents of the register being loaded. In such cases, the 
hardware interlocks, requiring additional real cycles, so scheduling load 
delay slots is still desirable, although not required for functional code. 

The behavior of the load store instructions is dependent on the width 
of the FGRs. 

• When the FR bit in the Status register equals zero, the Floating-Point 
General registers {FGRs) are 32-bits wide. 

• When the FR bit in the Status register equals one, the Floating-Point 
General registers (FGRs) are 64-bits wide. 

In the load and store operation descriptions, the functions listed in 
Table B.3 are used to summarize the handling of virtual addresses and 
physical memory. 



Function 


Meaning 


AddressTranslation 


Uses the TLB to find the physical address given the virtual 
address. The function fails and an exception is taken if 
the required translation is not present in the TLB. 


LoadMemory 


Uses the cache and main memory to find the contents of 
the word containing the specified physical address. The 
low-order two bits of the address and the Access Type field 
indicates which of each of the four bytes within the data 
word need to be returned. If the cache is enabled for this 
access, the entire word is returned and loaded into the 
cache. 


StoreMemory 


Uses the cache, write buffer, and main memory to store 
the word or part of word specified as data in the word con- 
taining the specified physical address. The low-order two 
bits of the address and the Access Type field indicates 
which of each of the four bytes within the data word 
should be stored. 



Table B.3 Load and Store Common Functions 

Figure B. 1 shows the I-Type instruction format used by load and store 
operations. 



I-Type (Immediate) 




31 26 25 21 20 16 15 




op 


base 


ft 


offset 1 




6 5 5 16 


op is a 6-bit operation code 


base is the 5-bit base register specifier 


ft is a 5-bit source (for stores) or destination (for loads) FPA register specifier 


offset is the 16-bit signed immediate offset 



Figure B. 1 Load and Store Instruction Format 
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All coprocessor loads and stores reference aligned-word data items. 
Thus, for word loads and stores, the access type field is always WORD, and 
the low-order two bits of the address must always be zero. 

For doubleword loads and stores, the access type field is always 
DOUBLEWORD, and the low-order three bits of the address must always 
be zero. 

Regardless of byte-numbering order (endianness), the address 
specifies that byte which has the smallest byte-address in the addressed 
field. For a big-endian machine, this is the leftmost byte; for a little-endian 
machine, this is the rightmost byte. 

Computational Instructions 

Computational instructions include all of the arithmetic floating-point 
operations performed by the FPU. 

Figure B.2 shows the R-Type instruction format used for 
computational operations. 



R-Type (Register) 

31 26 25 



21 20 



16 15 



11 10 



6 5 



COP1 



fmt 



ft 



fs 



fd 



function 



5 5 5 

COP1 is a 6-bit operation code 

fmt is a 5-bit format specifier 

fs is a 5-bit source 1 register 

ft is a 5-bit source2 register 

fd is a 5-bit destination register 

function is a 6-bit function field 



Figure B.2 Computational Instruction Format 

The function field indicates the floating-point operation to be 
performed. 

Each floating-point instruction can be applied to a number of operand 
formats. The operand format for an instruction is specified by the 5-bit 
format field; decoding for this field is shown in Table B.4. 



Code 


Mnemonic 


Size 


Format 


16 


S 


single 


Binary floating-point 


17 


D 


double 


Binary floating-point 


18 


Reserved 


19 


Reserved 


20 


W 


single 


32-bit binary fixed-point 


21 


L 


longword 


64-bit binary fixed-point 


22-31 


Reserved 



Table B.4 Format Field Decoding 
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Table B.5 lists all floating-point instructions. 



Code 

(5:0) 


Mnemonic 


Operation 





ADD 


Add 


1 


SUB 


Subtract 


2 


MUL 


Multiply 


3 


DIV 


Divide 


4 


SQRT 


Square root 


5 


ABS 


Absolute value 


6 


MOV 


Move 


7 


NEG 


Negate 


8 


ROUND.L 


Convert to single fixed-point, rounded to nearest/even 


9 


TRUNC.L 


Convert to single fixed-point, rounded toward zero 


10 


CEIL.L 


Convert to single fixed-point, rounded to ■+«* 


11 


FLOOR.L 


Convert to single fixed-point, rounded to -o° 


12 


ROUND.W 


Convert to single fixed-point, rounded to nearest/even 


13 


TRUNC.W 


Convert to single fixed-point, rounded toward zero 


14 


CEIL.W 


Convert to single fixed-point, rounded to + «> 


15 


FLOOR.W 


Convert to single fixed-point, rounded to -«> 


16-31 


- 


Reserved 


32 


CVT.S 


Convert to single floating-point 


33 


CVT.D 


Convert to double floating-point 


34 


- 


Reserved 


35 


- 


Reserved 


36 


CVT.W 


Convert to 32-bit binary fixed-point 


37 


CVT.L 


Convert to 64-bit binary fixed-point 


38-47 


- 


Reserved 


48-63 


c 


Floating-point compare 



Table B.5 Floating-Point Instructions and Operations 

In the following pages, the notation FGR refers to the 32 General 
Purpose registers FGRO through FGR31 of the FPU, and FPR refers to the 
floating-point registers of the FPU. 

• When the FR bit in the Status register (SR(26)) equals zero, only the 
even floating-point registers are valid and the 32 General Purpose reg- 
isters of the FPU are 32-bits wide. 

• When the FR bit in the Status register (SR(26)) equals one, both odd 
and even floating-point registers may be used and the 32 General Pur- 
pose registers of the FPU are 64-bits wide. 

The following routines are used in the description of the floating-point 
operations to retrieve the value of an FPR or to change the value of an FGR: 
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FR=0 

value <- ValueFPR(fpr, fmt) 

case fmt of 

S, W: 

ifFGRo = 

value <- FGRffpr] 

else 

value <- FGRffpr - 1] 

endif 

D: 

/* undefined for fpr not even */ 

value <- FGR[fpr] 

end 

StoreFPR(fpr, frnt, value): 

case fmt of 

S, W: 

ifFGRo = 

FGRffpr] <- FGR[fpr] 63 ..3 2 I I value 

else 

FGRffpr - 1] <- value I I FGRffpr - 1] 31 

endif 

D: 

/* undefined for fpr not even */ 

FGR[fpr] <- value 

end 



FR= 1 


value <- ValueFPR(fpr, fmt) 


case fmt of 


S: 


value <- FGR[fpr] 31 


D, L: 


value <r- FGR[fpr] 


W: 


value <- FGR[fpr] 


end 


StoreFPR(fpr, fmt, value): 


case fmt of 


S, W: 


FGR[fpr] <- undefined 32 1 1 value 


D, L: 


FGR[fpr] <r- value 


end 
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ABS.fmt 



Floating-Point 
Absolute Value 



ABS.fmt 



31 26 25 21 


20 16 


15 


11 


10 


6 


5 


COP1 
01 0001 


fmt 



00000 


fs 


fd 


ABS 
0001 01 



Format: 

ABS.fmt fd, fs 

Description: 

The contents of the FPU register specified by fs axe interpreted in the 
specified format and the arithmetic absolute value is taken. The result is 
placed in the floating-point register specified by/ct 

The absolute value operation is arithmetic; a NaN operand signals 
invalid operation. 

This instruction is valid only for single- and double-precision floating- 
point formats. The operation is not defined if bit O of any register 
specification is set and the FR bit in the Status register equals zero, since 
the register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, both 
even and odd register numbers are valid. 

Operation: 



T: StoreFPR(fd, fmt, AbsoluteValue(ValueFPR(fs, fmt))) 



Exceptions: 

Coprocessor unusable exception 
Coprocessor exception trap 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
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ADD.fmt Floating-Point Add ADD.flTlt 



31 26 25 21 


20 


16 


15 


11 


10 


6 


5 


C0P1 
01 0001 


fmt 


ft 


fs 


fd 


ADD 
000000 



Format: 

ADD.fmt fd, fs, ft 

Description: 

The contents of the FPU registers specified by fs and ft are interpreted 
in the specified format and arithmetically added. The result is rounded as 
if calculated to infinite precision and then rounded to the specified format 
[fmt), according to the current rounding mode. The result is placed in the 
floating-point register (FPR) specified by fd 

This instruction is valid only for single- and double-precision floating- 
point formats. The operation is not defined if bit O of any register 
specification is set and the FR bit in the Status register equals zero, since 
the register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, both 
even and odd register numbers are valid. 

Operation: 



StoreFPR (fd, fmt, ValueFPR(fs, fmt) + ValueFPR(ft, fmt)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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BC1F 



Branch On FPA False 
(Coprocessor 1) 



BC1F 





31 26 25 21 20 1615 











COP1 
01 0001 


BC 
01 000 


BCF 
00000 


offset 






6 5 5 


16 







Format: 

BC1F offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. If the result of the last floating-point compare is false, 
the program branches to the target address, with a delay of one 
instruction. 



Operation: 




T-1: 


condition <- not COC[1 ] 


T: 


target <- (offset 15 ) 46 II offset II 2 


T+1: 


if condition then 




PC <- PC + target 




endif 



Exceptions: 

Coprocessor unusable exception 
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BC1FL 



Branch On FPU False Likely 
(Coprocessor 1) 



BC1FL 





31 26 25 21 20 16 15 




C 


i 




COP1 
01 0001 


BC 
01 000 


BCFL 
0001 


offset 






6 5 5 


16 







Format: 

BC1FL offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. 

If the result of the last floating-point compare is false, the program 
branches to the target address, with a delay of one instruction. If the 
conditional branch is not taken, the instruction in the branch delay slot is 
nullified. 



Operation: 




T-1: 

T: 

T+1: 


condition <- not COC[1] 

target <- (offset 15 ) 46 II offset II 2 

if condition then 




PC <- PC + target 
else 




NullifyCurrentlnstruction 
endif 



Exceptions: 

Coprocessor unusable exception 
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BC1T 



Branch On FPU True 
(Coprocessor 1) 



BC1T 
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COP1 
01 0001 


BC 
01 000 


BCT 
00001 


offset 






6 5 5 


16 







Format: 

BC IT offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. If the result of the last floating-point compare is true, 
the program branches to the target address, with a delay of one 
instruction. 



Operation: 








T-1: 

T: 

T+1: 


condition <-COC[1] 

target <- (offset 15 ) 46 II offset II 2 

if condition then 

PC <- PC + target 
endif 



Exceptions: 

Coprocessor unusable exception 
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BC1TL 



Branch On FPU True Likely 
(Coprocessor 1) 



BC1TL 
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26 25 



21 20 



1615 



COP1 
01 0001 


BC 
01 000 


BCTL 
0001 1 


offset 



16 



Format: 

BC1TL offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset shifted left two bits 
and sign-extended. 

If the result of the last floating-point compare is true, the program 
branches to the target address, with a delay of one instruction. If the 
conditional branch is not taken, the instruction in the branch delay slot is 
nullified. 



Operation: 




T-1: 

T: 

T+1: 


condition <-COC[1] 

target <- (offset 15 ) 46 II offset II 2 

if condition then 




PC <- PC + target 
else 




NullifyCurrentlnstruction 
endif 



Exceptions: 

Coprocessor unusable exception 
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C.cond.fmt Flo com 9 p" a P r e int C.cond.fmt 
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C0P1 
01 0001 


fmt 


ft 


fs 



00000 


FC* 


cond* 



Format: 

C.cond.fmt fs, ft 

Description: 

The contents of the floating-point registers specified by fs and ft are 
interpreted in the specified format and arithmetically compared. 

A result is determined based on the comparison and the conditions 
specified in the instruction. If one of the values is a Not a Number (NaN), 
and the high-order bit of the condition field is set, an invalid operation 
exception is taken. After a one-instruction delay, the condition is available 
for testing with branch on floating-point coprocessor condition 
instructions. 

Comparisons are exact and can neither overflow nor underflow. Four 
mutually-exclusive relations are possible as results: less than, equal, 
greater than, and unordered. The last case arises when one or both of the 
operands are NaN; every NaN compares unordered with everything, 
including itself. 

Comparisons ignore the sign of zero, so +0 = -0. 

This instruction is valid only for single- and double-precision floating- 
point formats. The operation is not defined if bit of any register 
specification is set and the FR bit in the Status register equals zero, since 
the register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, both 
even and odd register numbers are valid. 

Note: *See "FPU Instruction Opcode Bit Encoding" at the end of 
Appendix B. 
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Operation: 



if NaN(ValueFPR(fs, fmt)) or NaN(ValueFPR(ft, fmt)) then 

less <r- false 

equal <- false 

unordered <- true 

if cond 3 then 

signal InvalidOperationException 

endif 
else 

less <- ValueFPR(fs, fmt) < ValueFPR(ft, fmt) 

equal <- ValueFPR(fs, fmt) = ValueFPR(ft, fmt) 

unordered <- false 
endif 
condition <- (cond 2 and less) or (cond.| and equal) or 

(cond and unordered) 
FCR[31] 2 3 <- condition 
COC[1]<- condition 



Exceptions: 

Coprocessor unusable 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
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CEILLfmt 



Floating-Point 

Ceiling to Long 

Fixed-Point Format 



CEIL.L.fmt 



31 26 25 21 


20 16 


15 


11 


10 


6 


5 


COP1 
01 0001 


fmt 



00000 


fs 


fd 


CEILL 
001 01 



Format: 

CEIL.L.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs axe 
interpreted in the specified source format, fint, and arithmetically 
converted to the single fixed-point format. The result is placed in the 
floating-point register specified by fd. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round to +<*> (2). 

This instruction is valid only for conversion from single- or double- 
precision floating-point formats. When the FR bit in the Status register 
equals one, both even and odd register numbers are valid. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2 s3 to 2 63 - 1, the Invalid operation exception 
is raised. If the Invalid operation is not enabled then no exception is taken 
and 2 63 -l is returned. 



Operation: 



StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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Floating-Point 



CEILW.fmt SSSSSww* CEIL.W.fmt 



Fixed-Point Format 
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15 


11 
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fmt 



00000 


fs 


fd 


CEIL.W 
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Format: 

CEIL.W.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are 
interpreted in the specified source format, fmt, and arithmetically 
converted to the single fixed-point format. The result is placed in the 
floating-point register specified by fd. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round to +«> (2). 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit of any 
register specification is set and the FR bit in the Status register equals zero, 
since the register numbers specify an even-odd pair of adjacent 
coprocessor general registers. When the FR bit in the Status register 
equals one, both even and odd register numbers are valid. 

When the source operand is an Infinity or NaN, or the correctly 
rounded integer result is outside of -2 31 to 2 31 - 1, the Invalid operation 
exception is raised. If the Invalid operation is not enabled then no 
exception is taken and 2 31 -1 is returned. 

Operation: 



T: StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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CFC1 



Move Control Word From FPU 
(Coprocessor 1) 



CFC1 



31 



26 25 



21 20 



16 15 



11 10 



COP1 
01 0001 


CF 
0001 


rt 


fs 



000 0000 0000 



11 



Format: 

CFC1 rt, fs 

Description: 

The contents of the FPU control register fs are loaded into general 
register rt. 

This operation is only defined when/s equals or 31. 

The contents of general register rt are undefined for time T of the 
instruction immediately following this load instruction. 

Operation: 



T: temp <- FCR[fs] 

T+1: GPR[rt] <- (temp 31 ) 32 II temp 



Exceptions: 

Coprocessor unusable exception 
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CTC1 



Move Control Word To FPU 
(Coprocessor 1) 



CTC1 
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rt 


fs 



000 0000 0000 






6 5 5 


5 


11 





Format: 

CTC1 rt, fs 

Description: 

The contents of general register rt are loaded into FPU control register 
fs. This operation is only defined when/s equals 31. 

Writing to Control Register 31, the floating-point Control/ Status 
register, causes an interrupt or exception if any cause bit and its 
corresponding enable bit are both set. The register will be written before 
the exception occurs. The contents of floating-point control register fs are 
undefined for time T of the instruction immediately following this load 
instruction. 

Operation: 



T: temp<- GPR[rt] 31>>0 

T+1: FCRffs] <- temp 

COC[1]<-FCR[31] 23 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Division by zero exception 
Inexact exception 
Overflow exception 
Underflow exception 
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CVT.D.fmt 



Floating-Point 

Convert to Double 

Floating-Point Format 



CVT.D.fmt 
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Format: 

CVT.D.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs is interpreted 
in the specified source format, fmt and arithmetically converted to the 
double binary floating-point format. The result is placed in the floating- 
point register specified by fd. 

This instruction is valid only for conversions from single floating-point 
format, 32-bit or 64-bit fixed-point format. 

If the single floating-point or single fixed-point format is specified, the 
operation is exact. The operation is not defined if bit of any register 
specification is set and the FR bit in the Status register equals zero, since 
the register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, both 
even and odd register numbers are valid. 

Operation: 



StoreFPR (fd, D, ConvertFmt(ValueFPR(fs, fmt), fmt, D)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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CVT.Lfmt 



Floating-Point 

Convert to Long 

Fixed-Point Format 
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Format: 

CVT.L.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are 
interpreted in the specified source format, fiat, and arithmetically 
converted to the long fixed-point format. The result is placed in the 
floating-point register specified by fd. 

This instruction is valid only for conversions from single- or double- 
precision floating-point formats. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of-2 63 to 2 63 -l, the Invalid operation exception is 
raised. If the Invalid operation is not enabled then no exception is taken 
and 2 63 -l is returned. 



Operation: 



StoreFPR (fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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Format: 

CVT.S.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are 
interpreted in the specified source format, jrnt, and arithmetically 
converted to the single binary floating-point format. The result is placed 
in the floating-point register specified by fd. Rounding occurs according to 
the currently specified rounding mode. 

This instruction is valid only for conversions from double floating-point 
format, or from 32-bit or 64-bit fixed-point format. The operation is not 
defined if bit of any register specification is set and the FR bit in the 
Status register equals zero, since the register numbers specify an even-odd 
pair of adjacent coprocessor general registers. When the FR bit in the 
Status register equals one, both even and odd register numbers are valid. 

Operation: 



StoreFPR(fd, S, ConvertFmt(ValueFPR(fs, fmt), fmt, S)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
Underflow exception 



B-23 



FPU Instruction Set Details 



Appendix B 



CVT.W.fmt 
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Format: 

CVT.W.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs axe 
interpreted in the specified source format, fmt, and arithmetically 
converted to the single fixed-point format. The result is placed in the 
floating-point register specified by fd. This instruction is valid only for 
conversion from a single- or double-precision floating-point formats. The 
operation is not defined if bit of any register specification is set and the 
FR bit in the Status register equals zero, since the register numbers specify 
an even-odd pair of adjacent coprocessor general registers. When the FR 
bit in the Status register equals one, both even and odd register numbers 
are valid. 

When the source operand is an Infinity or NaN, or the correctly 
rounded integer result is outside of -2 31 to 2 31 -1, an Invalid operation 
exception is raised. If Invalid operation is not enabled, then no exception 
is taken and 2 31 -1 is returned. 



Operation: 



StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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Format: 

DIV.fmt fd, fs, ft 

Description: 

The contents of the floating-point registers specified by fs and ft axe 
interpreted in the specified format and arithmetically divided. The result 
is rounded as if calculated to infinite precision and then rounded to the 
specified format, according to the current rounding mode. The result is 
placed in the floating-point register specified by fd 

This instruction is valid for only single or double precision floating- 
point formats. 

The operation is not defined if bit of any register specification is set 
and the FR bit in the Status register equals zero, since the register 
numbers specify an even-odd pair of adjacent coprocessor general 
registers. When the FR bit in the Status register equals one, both even and 
odd register numbers are valid. 

Operation: 



T: 



StoreFPR (fd, fmt, ValueFPR(fs, fmt)/ValueFPR(ft, fmt)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Division-by-zero exception 
Inexact exception 
Overflow exception 
Underflow exception 
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Format: 

DMFC1 it, fs 

Description: 

The contents of register/s from the floating-point coprocessor is stored 
into processor register rt. 

The contents of general register rt are undefined for time T of the 
instruction immediately following this load instruction. 

The FR bit in the Status register specifies whether all 32 registers of the 
R4600 are addressable. When FR equals zero, this instruction is not 
defined when the least significant bit of fs is non-zero. When FR is set, fs 
may specify either odd or even registers. 

Operation: 



T: 


if SR 26 = 1 then 




data<-CPR[1,fs] 




else 




data<-CPR[1,fs 4 1 110] 




endif 


T+1: 


GPR[rt] <- data 



Exceptions: 

Coprocessor unusable exception 
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Format: 

DMTC1 rt, fs 

Description: 

The contents of general register rt are loaded into coprocessor register 
/softheCPl. 

The contents of floating-point register/s are undefined for time T of the 
instruction immediately following this load instruction. 

The FR bit in the Status register specifies whether all 32 registers of the 
R4600 are addressable. When FR equals zero, this instruction is not 
defined when the least significant bit of fs is non-zero. When FR equals 
one, fs may specify either odd or even registers. 



Operation: 




T: 


data <- GPRfrt] 


T+1: 


if SR 26 = 1 then 




CPR[1 , fs] <r- data 




else 




CPR[1,fs 4 1 H0]«-data 




endif 



Exceptions: 

Coprocessor unusable exception 
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Format: 

FLOORL.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are 
interpreted in the specified source format, font, and arithmetically 
converted to the single fixed-point format. The result is placed in the 
floating-point register specified by fd. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round to -«> (3). 

This instruction is valid only for conversion from single- or double- 
precision floating-point formats. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2 s3 to 2 63 - 1, the Invalid operation exception 
is raised. If the Invalid operation is not enabled then no exception is taken 
and 2 63 -l is returned. 



Operation: 



T: StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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Format: 

FLOOR. W.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are 
interpreted in the specified source format, fmt, and arithmetically 
converted to the single fixed-point format. The result is placed in the 
floating-point register specified by fd. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round to -<» (RM = 3). 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit of any 
register specification is set and the FR bit in the Status register equals zero, 
since the register numbers specify an even-odd pair of adjacent 
coprocessor general registers. When the FR bit in the Status register 
equals one, both even and odd register numbers are valid. 

When the source operand is an Infinity or NaN, or the correctly 
rounded integer result is outside of -2 31 to 2 31 -1, an Invalid operation 
exception is raised. If Invalid operation is not enabled, then no exception 
is taken and 2 31 -1 is returned. 



Operation: 



StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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Format: 

LDC1 ft, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. 

When FR = 0, the contents of the doubleword at the memory location 
specified by the effective address is loaded into registers ft and ft+1 of the 
floating-point coprocessor. This instruction is not valid, and is undefined, 
when the least significant bit of ft is non-zero. 

When FR = 1, the contents of the doubleword at the memory location 
specified by the effective address are loaded into the 64-bit register^ of the 
floating point coprocessor. 

The FRbit of the Status register (SR 2 6) specifies whether all 32 registers 
of the R4600 are addressable. If FR equals zero, this instruction is not 
defined when the least significant bit of ft is non-zero. If FR equals one, jft 
may specify either odd or even registers. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 

Operation: 



T: vAddr <- 


((offset 15 ) 48 


1 1 offset 15 .. ) + GPR[basel 




(pAddr, uncached) «- 


AddressTranslation (vAddr, DATA) 


data <- LoadMemory(uncached, DOUBLEWORD, 


pAddr, vAddr, DATA) 


ifSR 26 = 


: 1 then 






CPR[1, 


ft] <- data 






else 








CPR[1, 


ft4..i 1 1 0]«-data 




endif 









Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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Format: 

LWC1 ft, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. The contents of the 
word at the memory location specified by the effective address is loaded 
into register ft of the floating-point coprocessor. 

The FR bit of the Status register specifies whether all 64-bit Floating- 
Point registers are addressable. If FR equals zero, LWC1 loads either the 
high or low half of the 16 even Floating-Point registers. If FR equals one, 
LWC1 loads the low 32-bits of both even and odd Floating-Point registers. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception occurs. 

Operation: 



T: 



vAddr <r- ((offset^) 48 I I offset 15 .. ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr <r- pAddr PSIZE _ 1##3 I I (pAddr 2 ..o xor (ReverseEndian I I 2 )) 

mem <- LoadMemory(uncached, WORD, pAddr, vAddr, DATA) 

byte <- vAddr 2 .. xor (BigEndianCPU I I 2 ) 

if SR 26 = 1 then 

CPR[1, ft] <- undefined 32 I I mem 31+8 * byte> 8 * byte 

else if fto=0 then 

CPR[1, ft 4 ..! I I 0] <- CPR[1, ft4..! I I 0] 64 ..32 I I mem 31+8 * byte .. 8 * byte 

else 

CPR[1, ft 4>>1 I I O] <- mem 31+8 * byte .. 8 *byte I I CPR[1, ft 4>#1 I I 0] 31 .. 

endif 



Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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Format: 

MFC1 rt, fs 

Description: 

The contents of register fs from the floating-point coprocessor are 
loaded into processor register rt 

The contents of register rt are undefined for time T of the instruction 
immediately following this load instruction. 

The FR bit of the Status register specifies whether all 32 registers of the 
R4600 are addressable. If FR equals zero, MFC1 loads either the high or 
low half of the 16 even Floating-Point registers. If FR equals one, MFC 1 
stores the low 32-bits of both even and odd Floating-Point registers. 

Operation: 



T: 


if SR 26 = 1 then 




data<-CPR[1,fs] 




else if fs = then 




data <- CPR[1, fs 4i<1 II 0] 31 „ 




else 




data <- CPR[1, fs 4 #1 II 0] 63 t32 




endif 


T+1: 


GPR[rt] <r- (data 31 ) 32 II data 



Exceptions: 

Coprocessor unusable exception 
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Format: 

MOV.fmt fd, fs 

Description: 

The contents of the FPU register specified by fs are interpreted in the 
specified format and are copied into the FPU register specified by fd. 

The move operation is non-arithmetic; no IEEE 754 exceptions occur 
as a result of the instruction. 

This instruction is valid only for single- or double-precision floating- 
point formats. 

The operation is not defined if bit of any register specification is set 
and the FR bit in the Status register equals zero, since the register 
numbers specify an even-odd pair of adjacent coprocessor general 
registers. When the FR bit in the Statics register equals one, both even and 
odd register numbers are valid. 

Operation: 



StoreFPR(fd, fmt, ValueFPR(fs, fmt)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
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Format: 

MTC1 it, fs 
Description: 

The contents of register rt are loaded into the FPU general register at 
location fs. 

The contents of floating-point register fs is undefined for time T of the 
instruction immediately following this load instruction. 

The FR bit of the Status register specifies whether all 32 registers of the 
R4600 are addressable. If FR equals zero, MTC1 loads either the high or 
low half of the 16 even Floating-Point registers. If JFK equals one, MTC 1 
loads the low 32-bits of both even and odd Floating-Point registers. 

Operation: 



T: 


data <- GPR[rt] 31 .. 


T+1: 


if SR 26 = 1 then 




CPR[1, fs] «- undefined 32 II data 




else if fs =0 then 




CPR[1, fs 4- 1 II 0] <r- CPR[1, fs 4- i II 0] 63- 32 II data 




else 




CPR[1, fs 4 . i II 0] <r- data II CPR[1 , fs 4 .., II 0] 31 . 




endif 



Exceptions: 

Coprocessor unusable exception 
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Format: 

MUL.fmt fd, fs, ft 

Description: 

The contents of the floating-point registers specified by fs and ft are 
interpreted in the specified format and arithmetically multiplied. The 
result is rounded as if calculated to infinite precision and then rounded to 
the specified format according to the current rounding mode. The result 
is placed in the floating-point register specified by fd. 

This instruction is valid only for single- or double-precision floating- 
point formats. 

The operation is not defined if bit of any register specification is set 
and the FR bit in the Status register equals zero, since the register 
numbers specify an even-odd pair of adjacent coprocessor general 
registers. When the FR bit in the Status register equals one, both even and 
odd register numbers are valid. 

Operation: 



StoreFPR (fd, fmt, ValueFPR(fs, fmt) * ValueFPR(ft, fmt)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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Format: 

NEG.fmt fd, fs 
Description: 

The contents of the FPU register specified by fs are interpreted in the 
specified format and the arithmetic negation is taken (polarity of the sign- 
bit is changed). The result is placed in the FPU register specified by fd. 

The negate operation is arithmetic; an NaN operand signals invalid 
operation. 

This instruction is valid only for single- or double-precision floating- 
point formats. The operation is not defined if bit of any register 
specification is set and the FR bit in the Status register equals zero, since 
the register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, both 
even and odd register numbers are valid. 

Operation: 



StoreFPR(fd, fmt, Negate(ValueFPR(fs, fmt))) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
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Format: 

ROUND.L.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are 
interpreted in the specified source format, fint and arithmetically 
converted to the long fixed-point format. The result is placed in the 
floating-point register specified by fd. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round to nearest/even (0). 

This instruction is valid only for conversion from single- or double- 
precision floating-point formats. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2 63 to 2 s3 - 1, the Invalid operation exception 
is raised. If the Invalid operation is not enabled then no exception is taken 
and 2 63 -l is returned. 

Operation: 



T: StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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Format: 

ROUND.W.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs art 
interpreted in the specified source format, font, and arithmetically 
converted to the single fixed-point format. The result is placed in the 
floating-point register specified by fd. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round to the nearest/ even 
(RM = 0). 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit of any 
register specification is set and the FR bit in the Status register equals zero, 
since the register numbers specify an even-odd pair of adjacent 
coprocessor general registers. When the FR bit in the Status register 
equals one, both even and odd register numbers are valid. 

When the source operand is an Infinity or NaN, or the correctly 
rounded integer result is outside of -2 31 to 2 31 -1, an Invalid operation 
exception is raised. If Invalid operation is not enabled, then no exception 
is taken and 2 31 -1 is returned. 

Operation: 



StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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Format: 

SDC1 ft, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. 

When FR = 0, the contents of registers J£ and/t+I from the floating- 
point coprocessor are stored at the memory location specified by the 
effective address. This instruction is not valid, and is undefined, when the 
least significant bit of ft is non-zero. 

When FR = 1, the 64-bit register ft is stored to the contents of the 
doubleword at the memory location specified by the effective address. The 
FR bit of the Status register (SR 2 6) specifies whether all 32 registers of the 
R4600 are addressable. When FR equals zero, this instruction is not 
defined if the least significant bit of ft is non-zero. If FR equals one, ft may 
specify either odd or even registers. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 

Operation: 



T: vAddr 


<- (offset 15 ) 16 1 1 offset 15 .. ) + GPRIbase] 




(pAddr 


, uncached) <- AddressTranslation (vAddr, DATA) 


ifSR 26 


= 1 
data<-CPR|l, ft] 




else 


data<-CPR[l, ft^ 1 1 0) 




endlf 






StoreMemory(uncached, DOUBLEWORD, data, 


pAddr, vAddr, DATA) 



Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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Format: 

SQRT.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are 
interpreted in the specified Jbrmat and the positive arithmetic square root 
is taken. The result is rounded as if calculated to infinite precision and 
then rounded to the specified format, according to the current rounding 
mode. If the value of fs corresponds to -O, the result will be -0. The result 
is placed in the floating-point register specified by fd. 

This instruction is valid only for single- or double-precision floating- 
point formats. 

The operation is not defined if bit of any register specification is set 
and the FR bit in the Status register equals zero, since the register 
numbers specify an even-odd pair of adjacent coprocessor general 
registers. When the FR bit in the Status register equals one, both even and 
odd register numbers are valid. 

Operation: 



StoreFPR(fd, fmt, SquareRoot(ValueFPR(fs, fmt))) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 



B-40 



FPU Instruction Set Details 



Appendix B 



SUB.f ITlt Floating-Point Subtract SUB.f Hit 



31 26 25 21 


20 


16 


15 


11 


10 


6 


5 


COP1 
01 0001 


fmt 


ft 


fs 


fd 


SUB 
000001 



Format: 

SUB.fmt fd, fs, ft 

Description: 

The contents of the floating-point registers specified by fs and ft are 
interpreted in the specified format and arithmetically subtracted. The 
result is rounded as if calculated to infinite precision and then rounded to 
the specified format according to the current rounding mode. The result 
is placed in the floating-point register specified by fd. 

This instruction is valid only for single- or double-precision floating- 
point formats. 

The operation is not defined if bit of any register specification is set 
and the FR bit in the Status register equals zero, since the register 
numbers specify an even-odd pair of adjacent coprocessor general 
registers. When the FR bit in the Status register equals one, both even and 
odd register numbers are valid. 

Operation: 



T: StoreFPR (fd, fmt, ValueFPR(fs, fmt) - ValueFPR(ft, fmt)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 
Overflow exception 
Underflow exception 



B-41 



FPU Instruction Set Details 



Appendix B 



SWC1 



Store Word from FPU 
(Coprocessor 1) 



SWC1 





31 26 25 21 20 


16 15 











SWC1 
1 1 1 001 


base 


ft 


offset 






6 5 5 




16 







Format: 

SWC1 ft, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. The contents of 
register ft from the floating-point coprocessor are stored at the memory 
location specified by the effective address. 

The FR bit of the Status register specifies whether all 64-bit floating- 
point registers are addressable. 

If FR = 0, SWC1 stores either the high or low half of the 16 even 
floating-point registers. 

If FR = 1, SWC1 stores the low 32-bits of both even and odd floating- 
point registers. 

If either of the two least-significant bits of the effective address are non- 
zero, an address error exception occurs. 

Operation: 



vAddr <- ((offset 15 ) 48 II offset 15i . ) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddrf- pAddrpsizE-i..3 I ' (pAddr 20 xor (ReverseEndian II 2 )) 
byte <- vAddr 2 .. xor (BigEndianCPU II 2 ) 
if SR 26 = 1 then 

data <- CPR[1 , ft] 63 - 8 *byte..O "I 8 * bytG 
else if ft =0 then 

data <- CPR[1, ft 4 .. 1 II 0] 63 _ 8 * byte .. II O 8 *^ 
else 

data <- O 32 " 8 *^ II CPR[1 , ft 4 .. 1 II 0] 63 .. 32 . 8 * b yte 
endif 
StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 



Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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Format: 

TRUNC.L.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs axe 
interpreted in the specified source format, finU and arithmetically 
converted to the single fixed-point format. The result is placed in the 
floating-point register specified by/d. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round toward zero (1). 

This instruction is valid only for conversion from single- or double- 
precision floating-point formats. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2 63 to 2 63 -l, the Invalid operation exception is 
raised. If the Invalid operation is not enabled then no exception is taken 
and 2 63 -l is returned. 



Operation: 



StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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Format: 

TRUNC.W.fmt fd, fs 

Description: 

The contents of the FPU register specified by fs are interpreted in the 
specified source format font and arithmetically converted to the single 
fixed-point format. The result is placed in the FPU register specified by fd. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round toward zero (RM = 1). 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit of any 
register specification is set and the FR bit in the Status register equals zero, 
since the register numbers specify an even-odd pair of adjacent 
coprocessor general registers. When the FR bit in the Status register 
equals one, both even and odd register numbers are valid. 

When the source operand is an Infinity or NaN, or the correctly 
rounded integer result is outside of -2 31 to 2 31 -1, an Invalid operation 
exception is raised. If Invalid operation is not enabled, then no exception 
is taken and 2 31 -1 is returned. 



Operation: 



StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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Figure B.3 Bit Encoding for FPU Instructions 
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Introduction 

This appendix lists cycle operation counts and caveats for R4600/ 
R4700 cache operations timing. 

Caveats About Cache Operations 

1. All cycle counts are in processor cycles. 

2. All cache ops have lower priority than cache misses, write backs and 
external requests. If the write back buffer contains unwritten data when 
a cache op is executed, the write back buffer will be retired before the cache 
op is begun. 

If an instruction cache miss occurs at the same time as a cache op is 
executed, the instruction cache miss will be handled first. Cache ops are 
mutually exclusive with respect to data cache misses. External requests 
will be completed before beginning a cache op. 

3. For all data cache ops the cache op machine waits for the store buffer 
and response buffer to empty before beginning the cache op. This can add 
3 cycles to any data cache op if there is data in the response buffer or store 
buffer. The response buffer contains data from the last data cache miss 
that has not yet been written to the data cache. The store buffer contains 
delayed store data waiting to be written to the data cache. 

4. Cache ops of the form xxxx_Writeback_xxxx may perform a write back 
which will fill the write back buffer. Write backs can affect subsequent 
cache ops, since they will stall until the write back buffer is written back 
to memory. Cache ops which fill the write back buffer are noted as 
(writeback) in the following tables. 

5. All cycle counts are best case assuming no interference from the 
mechanisms described above. 

Cache Operations Tables 

Table C.l and Table C.2 show data cache and instruction cache opera- 
tions information. A detailed explanation of the Fill_I equation follows 
Table C.2. 
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Code 1 


Name 


Number of Cycles 





Index_WritebackJnvalidateJD 


10 cycles if the cache line is clean. 
12 cycles if the cache line is dirty 
(Writeback). 


1 


IndexJLoadJTagJ) 


7 cycles. 


2 


Index_Store_TagJD 


8 cycles. 


3 


CreateJDirtyJExclusiveJ) 


10 cycles for a cache hit. 

13 cycles for a cache miss if the cache 

line is clean. 

15 cycles for a cache miss if the cache 

line is dirty (Writeback). 


4 


HitJnvalidateJD 


7 cycles for a cache miss. 
9 cycles for a cache hit. 


5 


Hit_WritebackJnvaUdate_D 


7 cycles for a cache miss. 

12 cycles for a cache hit if the cache 

line is clean. 

14 cycles for a cache hit if the cache 

line is dirty (Writeback). 


7 


HitJVritebackD 


7 cycles for a cache miss. 

10 cycles for a cache hit if the cache 

line is clean. 

14 cycles for a cache hit if the cache 

line is dirty (Writeback). 


Note: 

1 Code number corresponds to the code column of the CACHE instruction in Appendix A 



Table C.l Primary Data Cache Operations 
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Code 1 


Name 


Number of Cycles 





Index_Invalidate_I 


7 cycles. 


1 


Index_Load_TagJ 


7 cycles. 


2 


Index_Store_TagJ 


8 cycles. 


3 


n/a 


n/a 


4 


HitJnvalidateJ 


7 cycles for a cache miss. 
9 cycles for a cache hit. 


5 


FillJ 


Cycle number must be calculated based on the sys- 
tem response to a memory access, because FillJ 
causes an instruction cache refill from memory. 

This equation yields the number of processor cycles 
for a F1QJ cache op: 2 

Number of cycles for a Fill I CacheOp= 10 + {0 

- (SYSDIV - 1)} + (2 x SYSDIV) + 

(ML x SYSDIV) + (D x SYSDIV) 3 


6 


Hit_WritebackJ 


7 cycles for a cache miss. 

20 cycles for a cache hit (Writeback). 


Note: 

1 Code number corresponds to the code column of the CACHE instruction in Appendix A. 
2 For definitions and discussion of the FillJ equation variables refer to the subsection 

"Details of the FillJ Equation," which follows this table, 
^he term {0 - (SYSDIV - 1) has a value between and (SYSDIV - 1), depending on the 

alignment of the execution of the cache op with the system clock. 



Table C.2 Primary Instruction Cache Operations 



Details on the FillJ Equation 

These are the definitions for the Hit_Writeback_I equation in Table C.2: 



SYSDIV: 



ML: 



D: 



Number of processor cycles per system cycle; ranges from 

2-8. 

Number of system cycles of memory latency, defined as 

the number of cycles the SysAD bus is driven by the 

external agent before the first double word of data 

appears. 

Number of system cycles required to return the block of 

data, defined as the number of cycles beginning when the 

first double word of data appears on the SysAD bus and 

ending when the last double word of data appears on the 

SysAD bus, inclusive. 
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The R4600/R4700 provides a means to reduce the amount of power 
consumed by the internal core when the CPU would otherwise not be 
performing any useful operations. This is known as "Standby Mode" and 
is discussed in this appendix. 

Entering Standby Mode 

To enter Standby Mode, first execute the WAIT instruction. When the 
WAIT instruction finishes the W pipe-stage, if the SysAD bus is currently 
idle, the internal clocks will shut down, thus freezing the pipeline. The 
PLL, internal timer, some of the input pin clocks (Int[5:0]*, NMI*, 
ExtRqst*, Reset 4 ' and ColdReset*) and the output clocks (TClock[l:0], 
RClock[l:0], SyncOut, ModeClock and MasterOut) will continue to run. 
If the conditions are not correct when the WAIT instruction finishes the W 
pipe-stage (i.e., the SysAD bus is not idle), the WAIT is treated as a NOP. 

Once the CPU is in Standby Mode, any interrupt, including ExtRqst* or 
Reset*, will cause the CPU to exit Standby Mode. 
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This appendix identifies the R4600 and R4700 Coprocessor hazards. 
In Table E. 1 the number of instructions required between instruction A 
(which places a value in a CPO register) and instruction B (which uses the 
same register as a source) is computed using the following formula: 



(destination stage of A) - (source stage of B) - 1 



Operation 


SOURCE 
Name Stage 


DESTINATION 
Name Stage 


MTCO 


gprrt 


2(A) 


cprrd 


4(W)cc 


MFCO 


cpr rd 


2(A) 


gprrt 


4(W)ot 


TLBR 


Index, TLB 


2(A) 


PageMask, 
EntryHi,EntryLoO, EntryLol 


4(W) 


TLBWI 
TLBWR 


Index or Random, PageMask, 
EntryHi, EntryLoO, EntiyLol 


2(A) 


TLB 


3(D)P 


TLBP 


PageMask, EntryHi 


2(A) 


Index 


4(W) 


ERET 


EPC or ErrorEPC, 
Status.ERL 


2(A) 


Status.EXL, Status.ERL 


4{W)y 


LLbit 


4(W) 


CACHE Index Load 
Tag 






TagLo, TagHi, ECC 


3(D) 


CACHE Index Store 
Tag 


TagLo, TagHi, ECC 


3(D) 




Instruction fetch 


EntryHLASID, Status.KSU, Sta- 
tus. RE, Config.KOC, TLB 


0(1) 


Status.ERL, Status.EXL 


0(I)Y 


Instruction fetch 
exception 




EPC, Status, Cause 


4(W) 


BadVAddr, Context, EntryHi 


1(1)6 


Coprocessor usable 
test 


Status.CU, Status.KSU, Sta- 
tus.EXL, Status.ERL 


KR) 




Interrupt 


Cause.IP, Status.IM, Status.IE, 
Status.EXL, Status.ERL 


2(A) 


Load/Store 


EntryHLASID, Status.KSU, Sta- 
tus.RE, Status.ERL, Status.EXL 
Config.KOC, TLB 


2(A) 


Load/Store exception 






EPC, Status, Cause, Bad- 
VAddr, Context, EntryHi 


4(W) 


Notes: 

a There must be at least one instruction between a MTCO and a MFCO. 

P TLBW_ instructions will cause a one cycle slip. 

y Instructions fetches following an ERET will see a change in EXL or ERL in Stage 2 of the ERET in anticipation 
of the completion of the ERET. If the ERET does not complete, these instructions are killed before they commit 
changes in state other than noted by d. The pipestage corresponding to the stage field is given in parentheses. 



Table E. 1 Coprocessor O Hazards 
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Certain combinations of instructions are not permitted because the 
results of executing such combinations are unpredictable in the face of 
the events such as pipeline delays, cache misses, interrupts, and excep- 
tions. 

Most hazards result from instructions modifying and reading state in 
different pipeline stages. Such hazards are defined between pairs of 
instructions, not on a single instruction in isolation. Other hazards are 
associated with restartability of instructions in the presence of exceptions. 
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